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Brief Reports 


The Journal of Consulting Psychology will 
accept Brief Reports of research studies in 
clinical psychology for early publication with- 
out expense to the author. The procedure is 
intended to permit the publication of soundly 
designed studies of specialized interest or lim- 
ited importance which cannot now be ac- 
cepted because of lack of space. Several pages 
in each issue will be devoted to Brief Reports, 
published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
first or second issue to go to press following 
their final acceptance. 

An author who wishes to submit a Brief 
Report: 


1. Sends the Brief Report, limited to one printed 


page and prepared according to the specifications 
given below. 


2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 


charge to all who request it as long as the supply 
lasts. 


4. Agrees not to submit the full report to another 
journal of general circulation. 
Specifications 


Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 


To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 
author’s lines, must not exceed 70 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style (1). 
Headings, tables, and references are avoided 
or, if essential, must be counted in the 70 
lines. Each Brief Report must be accom- 
panied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 70-line quota: * 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. —— from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting.in advance $—— for microfilm or 
$—— for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 


Extended report. The full report is pre- 
pared in the style specified by the Pubdlica- 
tion Manual (1), except that it may be typed 
with single spacing for economy in photo- 
duplication by the ADI. 
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49, 389-449. 
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Comparison of Visual, Content, and Auditory 
Cues in Interviewing’ 


F. Harold Giedt’ 
VA Hospital, Sepulveda, California 


Despite the wide use and importance of the 
interview as a method of judging personality, 
it still remains largely an artistic technique 
rather than a scientific tool. Because of con- 
siderable “faith validity” and more modest 
empirical validity, the interview is most prob- 
ably here to stay. Consequently, the best re- 
search approach would seem to be to deter- 
mine in what way accurate or inaccurate in- 
formation is obtained through interviewing, 
rather than to try to prove merely whether or 
not it works. 

This study was designed, among other 
things, to determine which of the several 
forms of observations available to the clinician 
in the interview situation seems most related 
to making accurate judgments of a patient. 
The plan was to control experimentally the 
sorts of observations available to the evaluat- 
ing clinician by means of interviews recorded 
on sound film. By this procedure the clinician 
could be presented with (a) visual cues (silent 
film), (6) verbal content (verbatim written 
transcript), (c) content plus auditory cues 
(sound alone), or (d) content plus auditory 
and visual cues (complete sound film). 


Procedure 


Interviews. Two clinicians interviewed four 
patients, producing eight separate interviews 
which were recorded on sound film. Two in- 


1 From the Veterans Administration Hospital, Long 
Beach, California. 

2 This study is a revision of a doctoral dissertation 
submitted in 1951 to the Department of Psychology, 
University of California at Los Angeles. I shouid 
like to express my gratitude to the many professional 
clinicians who so kindly allowed their evaluative 
skills to be examined. Also I would like to thank 
Drs. James F. T. Bugental, Roy M. Dorcus, and 
John P. Seward for their aid and encouragement. 
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terviewers, a psychologist and a psychiatrist, 
were used to provide two different approaches 
to interviewing. Each interviewer was in- 
structed at least to touch on the patient’s 
home adjustment, work adjustment, educa- 
tional achievement, and his present illness. 
Both were also acquainted with the person- 
ality dimensions that were later to be rated 
on the basis of the recorded interviews. 

Four male patients were used in this study 
to provide some representation, albeit limited, 
of divergent personalities and modes of ad- 
justment or maladjustment. Three moderately 
disturbed, voluntarily hospitalized, neuropsy- 
chiatric patients and one general medical pa- 
tient were chosen. A brief description of each 
patient follows: 


Patient A was a young veteran diagnosed schizo- 
phrenic reaction, paranoid type, moderate. 

Patient B, a middle-aged man, was hospitalized 
for ulcer-like symptoms but later was diagnosed as 
having gastritis and a hypertrophic body in the 
antrum of the stomach. He represented a pleasant, 
near normal subject. 

Patient C was a young veteran diagnosed anxiety 
reaction, acute, with marked tension, irritability, 
mild depression, and battle dreams. 

Patient D was a middle-aged man who was first 
hospitalized for alcoholic gastritis and a possible 
ulcer, but was later diagnosed as schizophrenic reac- 
tion, unclassified, with a depressive reaction and 
chronic alcoholism. 


The interviews were all filmed in one after- 
noon. They were conducted in an office with 
a one-way mirror. Camera, sound equipment, 
and technical personnel were located in an 
adjacent room out of sight of the patients. 
There were about twenty minutes for the pa- 
tient to become accustomed to the lights and 
microphone equipment and for the interviewer 
to develop some rapport with the patient be- 
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fore the formal interview lasting twelve min- 
utes began. It was the consensus of opinion 
of those involved that all the patients re- 
sponded well to the interviews and did not 
appear to be particularly defensive, embar- 
rassed, or exhibitionistic. 

Evaluation measures. Given some sample 
of a person’s behavior as it occurs in an inter- 
view, there are a number of possible judg- 
ments or conclusions that could be made 
about that person. In actual practice clini- 
cians describe a patient’s personality, predict 
his behavior, and make recommendations for 
his treatment or disposition. Such end prod- 
ucts of the clinician’s interview with a pa- 
tient are usually either put in the form of 
written reports, or are directly translated into 
action of some sort with or upon the patient. 

In an objective study of the interview it is 
important to deal with these end products in 
as close to their original form as possible. 
However, they should also be made quite 
explicit and subject to some form of quantifi- 
cation or objective analysis. Personality rat- 
ing scales and prediction of patient responses 
to sentence completion test items were chosen 
as two evaluation instruments to provide some 
measure of the impressions gained from the 
interview materials. 

Rating scales. In order to have rating scales 
that would measure those personality charac- 
teristics most commonly considered in psy- 
chological reports, case histories, and case 
conferences, a list of traits or dimensions was 
gathered from such reports and conferences, 
and each trait was defined. In consultations 
with five clinicians the original list of thirty- 
three variables was reduced to eleven which 
were considered the most necessary dimen- 
sions for describing a patient’s personality. 
To make it possible for the rater to translate 
accurately his impressions of a patient into 
marks along a line representing variation in 
a personality characteristic, distances along 
the line were made meaningful by means of 
descriptive cue statements whose positions on 
the scales were empirically determined. The 
completed rating scales were constructed so 
that there was a separate page for each rating 
scale and space was provided for listing cues 
the judge felt he used in making the rating. 


Analysis of these reported cues will be re- 
ported in a later paper. 

For most of the personality characteristics 
that clinicians are called upon to evaluate 
(except intelligence), there are no stand- 
ardized tests that are widely accepted as 
criteria. The best available criterion in this 
case seemed to be agreement among five 
persons considered the best available judges 
of the personalities of the four patients. One 
judge was the chief psychiatrist at the hos- 
pital where all the patients considered were 
then under treatment. This psychiatrist was 
also one of the two interviewers in the study. 
Another judge was the psychologist who was 
the other interviewer in the study. Both made 
the personality ratings immediately following 
each interview with the four patients. Two 
other psychologists contributed criterion rat- 
ings on the basis of testing the patients. One 
administered the Rorschach and Wechsler- 
Bellevue test to all four patients. The other 
gave the Thematic Apperception Test and the 
Draw-A-Person test. The experimenter made 
another set of ratings. 

The criterion, as well as all subsequent 
ratings, were translated into numerical values 
ranging from zero to’ nine. The five sets of 
criterion ratings were then statistically studied 
by simple analysis of variance. This was done 
to determine whether the differences between 
mean ratings of patients were significantly 
more than might be expected on a chance 
basis, as determined from differences among 
the criterion judges’ ratings of single patients. 
It was felt that only if such agreement could 
be demonstrated could the average of the five 
judges’ ratings on one patient be considered 
stable enough to be used as a criterion rating. 
Only five of the rated personality character- 
istics showed considerable agreement (fp of 
.001) among the judges’ ratings for each pa- 
tient and one more showed fairly close agree- 
ment (p of .02). In addition to these six 
criterion ratings the Wechsler-Bellevue total 
IQ scores were used as the criteria for the 
ratings of Intelligence. Table 1 lists the seven 
characteristics for which acceptable criteria 
were established and gives the numerical value 
of the ratings for each patient as well as the 
meaning of the rating at the lowest numerical 
value. 
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Table 1 


Numerical Values of Criterion Ratings for Each Patient 

Patients 

Meaning of low numerical ant 

Personality characteristics A D values i 
Intelligence 3 6 4 2 Very superior * 
Anxiety 3 5 1 2 Overwhelmed with anxiety Pt 
Dependency-Independency 5 7 2 3 Wholly dependent on others at 
Adherence to Reality Psychotic disregard 
Direction of Feelings towards Others 6 5 5 7 Very warm feelings | 

Flexibility of Adjustive Techniques 7 + 7 7 Tries anything that works 


Predicting patient sentence completions. In 
this study as in Luft’s study (4), the method 
of controlled cross-predictions was selected as 
a second measure of effectiveness in judging 
others. This method of determining a clini- 
cian’s ability to predict a patient’s responses 
to open-ended incomplete sentences has the 
advantage of providing the clinician with an 
opportunity to predict the patient’s actual be- 
havior in a very specific, known situation, 
and of having the patient’s own responses 
serve as the criteria. Its chief disadvantage 
and criticism may be that the behavior which 
is predicted has little practical significance. 
However, the assumption is that differences 
in ability to predict less crucial but still far 
from simple behavior, such as a patient’s 
response to a test item, would reveal different 
degrees of understanding of that patient. 
Specifically, the method of controlled cross- 
predictions as used in this study required the 
clinician to select out of four choices the re- 
sponse he felt a particular patient gave in 
completing an incomplete sentence. The four 
choices were the actual responses given by 
the four patients interviewed in this study. 
An example of one such prediction situation 
follows: 


Item 12. Most of all I want (stimulus portion of the 
sentence) 

—to succeed (patient C’s response). 

—peace of mind (patient D’s response). 

—travel and a good looking wife (patient A’s re- 
sponse). 

—a good job (patient B’s response). 


There were a total of thirty-five incomplete 
sentences for which the clinicians had to 
select the actual responses given by the par- 


ticular patient under consideration and the 


number of correct predictions constituted his e. 
accuracy of prediction score. This measure ey; 
had a split-half reliability of .60 with the | 
Spearman-Brown Correction. 


Presenting the interview material. A num- 
ber of factors had to be considered in design- 
ing the exact manner in which the interviews 
were to be presented. There were (a) the ae 
four modes of presenting the interview mate- - 
rial, (6) the four patients to be judged, (c) = 
the two interviews with each patient, and (d) 
the order in which any condition was pre- 


sented. In order to study the effects these ek 
various independent variables had upon the 

dependent variables, the two measures of iy 
accuracy in making personality judgments, 


a modified greco-latin square design was em- , 
ployed. Ali factors were varied and controlled 


so that no condition had any priority in the = t 
Table 2 i 
Design for Presenting Interview Material i re 
Order of presentation 
Evaluating 
clinicians 1 2 3 4 a 
Group I Aal Bb1 Cc2 Dd2 
Group IT Bc2 Ad1 Da2 Chi 
Group IIT Cdl De2 Abl Ba2 es 
Group IV Db2 Ca2 Bdl Aci 
Group V Aa2 Bb2 Cel 
Group VI Bel Ad2 Dal Cb2 ey 
Group VIT Cd2 Del Ab2 Bal he 
Group VIIT Dbi Cal Bd2 Ac2 


Note.—Capital letters (A, B, C, D) represent the four pa- 
tients. Small letters (a, b, c, d) represent the four modes of 
presenting the interview material. Numbers (1, 2) represent 
the two interviewers. Each group was composed of two psy- 
chiatrists, two social workers, and two psychologists. 
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order of presentation and no combination of 
conditions was favored, as can be seen in 
Table 2. 

The evaluating clinicians. A most important 
group of participants in this study were the 
psychiatrists, social workers, and psycholo- 
gists, who graciously volunteered to observe 
the different sorts of interview material and 
then attempted to evaluate the patients who 
were interviewed. Altogether there were forty- 
eight such evaluating clinicians, with sixteen 
from each of the three specialties. Each group 
indicated in Table 2 was composed of a pair 
from each of the three professional groups 
with one of the pair being an experienced 
clinician and the other someone in training. 


Results 


Personality ratings. Each clinician’s rating 
of a particular patient on a particular trait 
was compared to the criterion rating for that 
patient and trait, and the amount of devia- 
tion determined. A total deviation or rating- 
error score for each clinician’s set of ratings 
on each patient was obtained by summing the 
deviations from the criterion ratings for the 
seven personality dimensions. To insure that 
the deviations for all seven dimensions would 
contribute equal weight to the total rating- 
error score, the variances of the seven dis- 
tributions of deviations were equated, and 
where necessary, the deviations were mul- 
tiplied by a constant before they were summed 
to form the total rating-error score. A low 
score indicates an accurate series of ratings. 

In order to get an idea of the statistical 
and psychological meaning of this total error 
score, the intercorrelations between the seven 
distributions of deviations from the criterion 
ratings were determined. The average correla- 
tion was .03 with only two positive and one 
negative coefficients of the twenty-one inter- 
correlations significantly above zero at the one 
per cent level. This finding suggested two 
possible explanations. The different ratings 
are in fact quite unrelated, and one would not 
expect accuracy in judging one trait to carry 
over to another trait. The other possibility 
was that the trait ratings are very unreliable 
measures and deviation fram or approxima- 
tion of the criterion is governed largely by 
chance. 


To determine which of these explanations is 
the more applicable the intercorrelations be- 
tween the actual ratings, disregarding the 
criterion ratings, were found for the seven 
rating scales. The average intercorrelation 
was only .16. Eleven positive and two nega- 
tive r’s of the matrix of twenty-one intercor- 
relations were significantly larger than zero 
at the one per cent level. These correlations 
suggest that the actual trait ratings were, 
aside from some halo effect, relatively inde- 
pendent ratings and lend support to the first 
explanation. Although no measure of repeat 
reliability of the ratings was found, an ap- 
proximation can be obtained in the intercor- 
relation of the scales, Direction of Feeling 
towards Others and Social Impression, which 
was .60. One scale was designed to measure 
the patient’s general friendliness and the other 
the judges’ friendliness towards that patient, 
dimensions which are probably very closely 
related. Thus it would seem that the repeat 
reliability would be greater than .60. The 
interjudge reliability ranges from an eta of 
.55 to .68 indicating a fair degree of agree- 
ment between the judges in their ratings. 

It therefore seems reasonable to conclude 
there is sufficient reliability in the ratings and 
independence in the scale dimensions to justify 
combining the separate deviations from the 
criterion ratings into a total error score as a 
measure of the accuracy of each rater’s judg- 
ment of a particular patient. 

When the data were tabulated, no even 
moderately significant over-all difference was 
found to exist between the accuracy of ratings 
based on interviews conducted by Interviewer 
No. 1 (psychiatrist) and Interviewer No. 2 
(psychologist). However, objective differences 
in the way in which the two interviewers 
operated and their effects on the patients’ 
responses has been demonstrated by Bugental 
(1) using the written protocols from this 
study. 

Since there was no demonstrable difference 
in the accuracy of ratings due to different 
interviewers, and in order to simplify further 
statistical treatment of the data, all the rat- 
ings on the same patient based on a particular 
mode of presenting the interview material 
were combined. This meant, in effect, super- 
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imposing the bottom half of Table 2 on the 
top half. 

Analysis of variance. Data derived from 
the four effects or variables listed below were 
analyzed: 


1. Order in which a patient was observed 
and rated. 

2. Sequence or pattern in which different 
patients and modes of presenting the inter- 
views were observed and rated. This effect 
also contains any differences between the four 
combined groups of clinicians as each group 
followed one of the four sequences of presenta- 
tion. 

3. Patient being observed and rated. 

4. Mode of presenting the interview mate- 
rial. 

From Table 3 it is apparent that the factor 
of the patient being rated had the greatest 
effect on the accuracy of the ratings. Patients 
A and B, who were most accurately judged, 
have most of their criterion ratings at or near 
the theoretical mean and have the largest 
number of ratings on the favorable ends of 
the scales. 

The mode of presenting the interview ma- 
terial appears to have some influence over 
the accuracy of the ratings. Actually, the only 
statistically reliable differences (p of .05 or 
better) between the different modes are be- 
tween the silent film and the other three 
modes and between the written protocols and 
complete sound film. 

There was a significant order or practice 
effect, in that a clinician’s ratings tended to 
improve in accuracy from the first to the 


third time he filled out the rating scale, but 
then dropped on the last time. The drop 
might have been due to the fact that most 
clinicians rushed through the last rating, as 
they were free to leave when they finished it. 

Because of the partial confounding of main 
effects with interactions in the greco-latin 
square, it was impossible to measure directly 
the interaction between specific variables. 
Fig. 1 suggests considerable interaction be- 
tween the mode of presentation and the pa- 
tient being rated. The relative contributions 
of order, sequence, and interaction, however, 
cannot be computed. Another indication of 
interaction effects is that the variance estimate 
based on the square residual, a rough indi- 
cator of interaction and error, approaches 
significance (p of .05). 

From Figure 1 it is clear that, particularly 
when only visual cues were presented, the four 
patients’ mean accuracy ratings differ widely. 
When only the written transcript of the in- 
terview was presented the mean accuracy of 
ratings for all four patients showed much 
more agreement than for any of the other 
three modes of presentation. 

Predictions of patient responses. The ac- 
curacy of predicting a patient’s actual sentence 
completion response was not appreciably af- 
fected by which of the two clinicians con- 
ducted the interview, so, as before, the scores 
were combined for the same patients and 
conditions of presentation. 

It can be seen from Figure 2 that when the 
clinicians saw only the silent movie of the 
interview their predictions of the patient’s 
responses were poorer than chance. In other 


Table 3 


Analysis of Variance on the Accuracy of Ratings 


Source of variance Sums of Degrees of Variance 

estimate squares freedom estimate F ratio 
Order 205.7 3 67.57 5.86*** 
Sequence 75.2 3 25.07 2.12 
Mode of presentation 188.6 3 62.87 5.38** 
Patient 274.3 3 91.43 7.81°** 
Square residual 133.0 3 44.33 3.87* 
Within cells (error) 1,949.0 176 11.07 
Total 2,825.8 191 


* Significant at .05 level of probability. 
** Significant at .01 level of probability. 
*** Significant at .001 level of probability. 
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Fig. 1. Mean accuracy of personality ratings of four patients under different conditions 


of observation. Lower error 


words, under such conditions the clinician 
was more apt to be wrong than right in his 
predictions. For all the other conditions the 
predictions were considerably better than 
chance. 


scores indicate greater accuracy. 


To determine how many predictions one 
could expect on a chance basis alone, the 
empirical probabilities of designating each of 
the four choices for all the thirty-five com- 
pleted sentences were determined. The prob- 
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Fig. 2. Mean accuracy for predicting responses of four patients under 
different conditions of observation. 
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Table 4 


Analysis of Variance on the Accuracy of Predictions of Patients’ Responses 


Source of variance Sums of Degrees of Variance 

estimate squares freedom estimate F ratio 
Order 26.9 3 8.97 — 
Sequence 32.8 3 10.93 1.05 
Mode of presentation 974.9 3 324.97 na. 
Patient 383.2 3 127.7 12.3*** 
Square residual 171.7 3 57.23 545° 
Within cells (error) 1,827.2 176 10.38 
Total 3,416.7 191 


** Significant at .01 level of probability. 
*** Significant at .001 level of probability. 


abilities for the thirty-five correct responses 
for each patient were summed and are repre- 
sented by horizontal lines in Figure 2. 

In the analysis of variance done on the 
prediction scores, the same four variables or 
effects were considered as in the case of the 
ratings. Table 4 gives the results of the analy- 
sis of variance. Clearly the mode of present- 
ing the interview material has a most sig- 
nificant effect on the number of correct pre- 
dictions the clinicians were able to make. 
Actually the only statistically reliable differ- 
ences were between the silent film presenta- 
tion and each of the other three modes. 

As in the case of the ratings, it was also 
found that the particular patient being con- 
sidered influenced the accuracy of predictions. 
The order and sequence of presentations pro- 
duced no significant effects. However, the 
square residual was found to yield a sig- 
nificant F ratio suggesting some possible in- 
teraction effect. In comparison with the ef- 
fect of different modes of presentation, this 
square residual is so much smaller that little 
doubt is cast on the significance of the single 
effect of the mode of presentation. Figure 2 
seems to indicate further no appreciable in- 
teraction such as was the case in Figure 1. 

Relationship between predictions and rat- 
ings. The rating-error scores and the number 
of correct predictions were correlated and the 
signs of the coefficients reversed to indicate 
the degree to which the accuracy of per- 
sonality rating was related to the accuracy of 
predicting patient responses. The resulting 
correlation coefficients are contained in Table 


Besides showing very little relationship be- 
tween the accuracy of the two evaluation in- 
struments, there is even a suggestion that 
under the condition of sound only the two 
instruments are measuring inversely related 
phenomena. 


Discussion 


Several quite certain, several speculative, 
and several practical interpretations can be 
made of the results of this study. When trying 
to rate a patient’s personality or predict 
his behavior, it helps considerably to know 
what he says about himself in an interview 
situation, whether one reads or hears what he 
had to say. Where the predictions of sentence 
completions were made from just seeing a pa- 
tient, the results in all four cases were poorer 
than could be expected on a chance basis. 
The increment in accuracy was considerable 
between the silent film presentation and the 
three other presentations (Figures 1, 2) which 
all had verbal content cues in common. One 


Table 5 


Relationship Between Predictions and Ratings 
under Different Conditions 


Number of 

Mode of interview 7+ — measures 

presentation r correlated 
Silent film .28 1.92 48 
Written transcript .22 1.51 48 
Sound recording —31 —2.12* 48 
Complete sound film 03 .20 48 
All conditions combined 14 2.00* 192 


* Significant at .05 level of probability. 
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might expect such results on a common sense 
basis except for the discovery that as a group 
the judges actually did worse than if they had 
used a chance system or predicted an “aver- 
age patient.” It may well have been that 
judges were misled by dress, appearance, ges- 
tures, or facial expressions. 

Another definite finding was that some pa- 
tients can be much more accurately pre- 
dicted or rated than others. However, although 
the “normal” subject’s responses were best 
predicted he was not the easiest to rate. Also 
with the ratings some patients were more ac- 
curately rated on the basis of one type of 
observation but were quite poorly rated in 
other situations. Thus there do not seem to 
be clear and consistent differences in the ac- 
curacy with which different patients can be 
judged that hold up for different media of 
observation and different forms of appraisal. 
There is some evidence that patients whose 
sentence completion responses were more av- 
erage-sounding, and frequently selected on a 
purely chance basis, were more accurately 
predicted and that those patients whose crite- 
rion ratings were mostly near the average 
range were more accurately rated. Gage (3) 
has specifically studied these types of effects. 

Of considerable interest is the lack of any 
clear and consistent relation between judges’ 
ability to make accurate ratings and their 
ability to make correct predictions. Also one 
wonders about the trend toward continued 
improvement with increased cues for the rat- 
ings as compared to the drop in accuracy in 
the complete sound film presentation for the 
predictions. Since they were not appreciably 
related, which of the two measures is the 
“truest” measure of ability to judge others, 
and consequently the most valid measure to 
use in comparing the value of different inter- 
view cue material? Quite clearly the two 
judging processes are different. According to 
Wallins’ classification (6) the ratings would 
be considered analytical-type judgments, in- 
volving some degree of abstraction and the 
predictions would be nonanalytical or em- 
pathic-type judgments. Taft (5) suggests 
there is no necessary relationship between 
these two methods of assessing ability to judge 
others, which was well demonstrated in this 
study. Whereas Taft points out that per- 


sonalities and other characteristics of judges 
are associated with success on either type of 
measure, results from this study suggest that 
the cues available to judges also have a dif- 
ferential value for the two types of judg- 
ments. 

Another explanation exists for the differ- 
ences shown in Figures 1 and 2, where the 
ratings show gradual improvement with ad- 
dition of cues while the predictions improve 
up to the sound-only condition, but then 
drop in accuracy with the complete sound 
film. In the former condition the judges, with 
each added bit of observation data, were ap- 
proximating the unlimited access to the pa- 
tient which the criterion judges had. Thus 
their accuracy improves as the experimental 
situation approximates the criterion situation, 
with possible biases based on a patient’s ap- 
pearance operating both on criterion and ex- 
perimental situation judges. Where the crite- 
rion was based on the patient’s actual free 
response to an open-ended incomplete sen- 
tence, opportunity to observe the patient ap- 
pears, if anything, to have had a slightly 
detrimental effect on accuracy. Although the 
results of this study are only suggestive, this 
difference has important implications for pre- 
diction studies. For instance, if a criterion 
such as job success is determined largely by 
supervisors with certain consistent biases or 
errors in judgment, prediction of “successful 
employees” should somehow include or take 
account of these biases or errors in the cri- 
terion so as to produce the most accurate 
predictions. However, if the criterion is some 
sort of performance or achievement measure 
little influenced by consistently biased supe- 
riors, prediction of success would of course 
be best accomplished by measures free of bias 
and conducive to objective appraisal. From 
this point of view, neither of the techniques 
for determining ability to judge others can be 
considered more “true” or basic, but both 
are appropriate representatives of different 
kinds of judgments we make about others. 

Because the differences in accuracy of rat- 
ings and predictions between the three condi- 
tions containing content cues were not sta- 
tistically significant, the safest conclusion 
would be that, in general, with this type of 
information-gathering, diagnostic type of in- 
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terview, there are no appreciable differences 
introduced by the different sorts of cues avail- 
able to the judge as long as the content of 
the interview was available to him. The in- 
teresting trend towards poorer predictions 
with the addition of visual cues to content and 
auditory cues should be subjected to further 
study with a greater number of patients to be 
observed, and limitation of the study to only 
the two conditions, so as to reduce the amount 
of variance introduced from other less relevant 
sources. If one accepts the conclusion that 
the three conditions produce essentially simi- 
lar results, a number of practical recom- 
mendations can be made. For financial or 
other considerations it is frequently possible 
to obtain only verbatim transcripts or sound 
recordings of an interview for use in research 
or other purposes. The question has been 
raised as to whether such material is an 
adequate representation of the original ses- 
sion, or whether a complete sound film was 
necessary. It would appear on the basis of 
this study that written protocols or sound 
recordings are quite adequate, and perhaps 
even superior where the purpose is to gain 
predictive knowledge of a person. 

Although this study was not done with 
the employment or selection interview as the 
focus, some of the conclusions from this study 
suggest variations in selection interviewing 
that might be found to be useful and ad- 
vantageous. Often at considerable expense 
and inconvenience to both applicant and em- 
ployer, a personal interview is arranged. If 
the results obtained in this study hold for the 
assessment of employees, then it might be 
possible for the applicant to be interviewed 
by telephone or for him to submit either a 
sound recording or a written protocol of an 
interview conducted by some recognized local 
interviewer. The employer could then have 
this “canned” interview appraised without 
actually having to see the applicant. Fay and 
Middleton (2) report better results in select- 
ing sales personnel by a similar technique. 
Whether such a procedure would emotionally 
satisfy an employer who has become accus- 
tomed to “seeing anyone I hire” is another 
question. 

The noticeable reduction in variation of 
mean rating error scores in the written tran- 


script condition shown in Figure 1, invites 
some explanation. In the other three condi- 
tions the appearance of some expressive be- 
havior of the patient is observable and may 
tend either to aid or mislead the clinicians. 
With the written transcript there are only the 
bare words used by the patient, which evi- 
dently have more nearly the same significance 
or cue value to the clinicians. Specific in- 
stances of the helpful and confusing effects 
of these noncontent cues will appear in an- 
other paper. Also because it was impossible to 
control whether a clinician reread parts of the 
written transcript, or to know how well each 
person could hear the recorded spoken words, 
variation in accuracy due to misunderstand- 
ing was doubtlessly reduced with the written 
protocols. 

Because this study was based on recorded 
interviews and the judges were limited to ob- 
serving someone else interview patients, the 
results cannot be immediately generalized to 
all “live” interviewing. They do, however, sug- 
gest that interviewers should be careful to 
avoid being misled by a patient’s appearance, 
should critically consider how they use ex- 
pressive cues, and should continue to rely 
heavily on what a patient says. 


Summary and Conclusions 


Interviews with four patients were recorded 
on sound film to study the sorts of observa- 
tions that aid the clinician in making ac- 
curate judgments of a patient’s personality 
and predictions of his verbal behavior. The 
interviews were presented as silent films, 
written transcripts, sound recordings, or com- 
plete sound films to forty-eight psychiatrists, 
social workers, and psychologists, who made 
ratings of the personality characteristics and 
predicted the responses to incomplete sen- 
tences of the patients who were interviewed. 

The following conclusions can be drawn 
from this study: 


1. There was marked and significantly 
greater accuracy of personality ratings when 
content cues were included, as in the written 
transcript, sound recording, and complete 
sound film, as compared to the silent film. 
Although this difference was the only clearly 
significant one, there was a general trend 
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with considerable variation from case to case 
for the ratings to improve in accuracy as 
more and more cues were added. 

2. Markedly and significantly better pre- 
dictions were made from written transcripts, 
sound recordings, and complete sound films 
than from silent films which actually led to 
predictions which were worse than could be 
expected on a chance basis. An interesting 
difference from the trend towards gradual im- 
provement with additional cues found with 
the ratings was seen with the predictions 
where somewhat poorer predictions were made 
in the complete sound film condition. The 
suggestion that visual cues or a patient’s 
appearance may impair predictions should 
be subjected to further study. 

3. As might well be expected, significant 
differences were found in how well each pa- 
tient was either rated or his responses pre- 
dicted. In the case of the predictions, the level 
of predictive accuracy varied for different pa- 
tients but the pattern for each patient over 
the four modes of presentation remained quite 
similar. However, with the ratings some pa- 
tients were more accurately rated in one con- 
dition and less in another, suggesting con- 
siderable interaction between the effects of 
the patient being rated and the type of ob- 
servation. 

4. No appreciable relationship was found 
between accuracy of personality ratings and 


correct predictions, which probably can be 
accounted for on the basis that the two evalu- 
ation techniques may tap quite different as- 
pects of personality, and also are based on 
somewhat different sorts of cues. 

5. No definitely superior medium was found 
for recording diagnostic-type interviews for 
studies of judgment of others so that any 
choice between written protocols, sound re- 
cordings, and complete sound films can be 
made on the basis of practical considerations, 
or factors not covered by this study. 


Received April 19, 1955. 
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The Stability of Interaction Chronograph Patterns 
in Psychiatric Interviews’ 


George Saslow, Joseph D. Matarazzo 
Massachusetts General Hospital * 


and Samuel B. Guze 
Washington University School of Medicine 


The objective assessment of personality is 
an old problem in psychological research. 
During the past fifty years many approaches 
to this problem have been undertaken, which 
have included questionnaires, paper and pen- 
cil inventories, projective tests such as the 
Rorschach and TAT, sentence completion 
tests, and others. The results with these vari- 
ous instruments, while sometimes positive, 
have been, in the minds of many investigators, 
less than encouraging. These investigators 
have felt that to attempt to assess fixed or 
unchanging intrapsychic traits such as levels 
of anxiety, the presence or absence of “de- 
fenses” of various kinds, etc., by such instru- 
ments was to commit the “organism error” 
described by MacKinnon (14, p. 43). This 
error involves the search for stable or in- 
variant personality characteristics which will 
be present independent of the stimulus situa- 
tion or “field” in which one is attempting to 
observe or record these traits. For this reason, 
many research workers, as well as many prac- 
ticing clinicians, use the equally unreliable 
(used here in the sense of unstandardized) 
“interview” as their instrument of assessment. 
One reason given for this is that the inter- 
view is flexible and allows one to explore 
unique areas of behavior for each individual. 
Also, the interview presents a “dynamic” 


1 This investigation was supported by a research 
grant (M-735) from the National Institute of Mental 
Health, of the National Institutes of Health, U. S. 
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2 The data of this study were collected while all 
three authors were at the Washington University 
School of Medicine. 


stimulus situation so that every individual is 
given an opportunity to manifest his unique, 
and presumably learned, social or interper- 
sonal behavior patterns (18). 

The notoriously low inter-interviewer agree- 
ment on assessment of personality patterus of 
functioning for given series of subjects (12, 
15) has made the interview subject to criti- 
cism also. It is important, therefore, if the 
interview is to be an instrument of research, 
to ask what could be achieved if the interview 
were standardized. Chapple (1-6), working 
primarily in applied anthropology, has been 
developing just such a standardized interview 
during the past several years. A review of 
this development and of the instrument, the 
interaction chronograph, used for making 
measurements of the interaction between in- 
terviewer and interviewee during the stand- 
ardized interview, is now in preparation and 
may be consulted for a more thorough de- 
scription than is possible below. 

Essentially, the interaction chronograph is 
a device which allows an observer to record 
in time units with a high degree of precision 
the behavioral interaction * of two individuals 
in terms of some ten or more variables. These 
variables, definitions of which are given in 
Table 1, are objectively recorded by a series 
of electrically controlled counters which are 
connected to two keys, one for the interviewer, 
the other for the subject. Each key is de- 
pressed whenever the designated individual is 

8 Records include only such behavior as number of 


utterances, number of interruptions, their durations, 
etc., and not the content of the verbalizations. 
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Table 1 
Definitions of Ten Interaction Chronograph Variables for Which Reliability Was Determined 


1. A’s Units: The designation “‘A”’ is used rather than 
“Pt.’s units” since interviewees other than patients 
are often recorded. This counter provides a fre- 
quency count of the actions of the interviewee. Each 
time he is active, and the observer depresses his key, 
the unit counter adds one unit. Thus a cumulative 
record is made of the number of times the individual 
was active during a period designated by the signal 
counter. In view of the fact that different people 
have different units of action during comparable 
interviews or time periods, Chapple divides each of 
the following variables (2-9) by the number of A’s 
units and thus obtains mean values which make 
possible comparisons between individuals. 

2. Tempo: The present machine does not record inter- 
viewee actions and silences separately (these, how- 
ever, can be computed readily when desired). 
Rather, this counter records the duration of each 
action plus its following inaction in a single measure. 
Thus it provides an index of how often one starts to 
act; the duration from one action to his next action; 
or his “tempo” of starts. 

3. Activity: This, too, is a duration measure derived 
from an individual’s actions and silences. However, 
unlike Tempo which is a measure of actions plus 
silences, the activity counter adds actions and sub- 
tracts silences. The cumulative record indicates, 
therefore, for any interview sequence, how much 
more active a person was than he was silent. Thus, 
the activity counter is an index of a subject’s 
“energy level,” states Chapple. 

4. A’s Adjustment: This counter, unlike Tempo and 
Activity, does not operate all the time the machine 
is on. It records only durations during which the 
person interrupted B (the other person) or failed to 
respond to B following the latter’s last action; both 
of which are failures of adjustment by A. The 
counter adds the durations of interruptions and sub- 
tracts from these the durations of failures to re- 
spond. Thus a positive “adjustment” score indi- 
cates that on the average A interrupted longer than 
he was silent. A negative score indicates the con- 


verse. One would expect a type of “manic” subject 
to earn a positive score while some “depressed” 
individuals should earn negative adjustment scores. 

5. B’s Adjustment: The other individual’s adjustment 
(i.e., the adjustment of the interviewer, usually). 
For a description substitute “B” for “A” in (4) 
above. 

6. Initiative: This is a frequency counter which records 
only after a period in which both A and B were silent 
(both keys up). It provides a measure of the relative 
frequency with which one person, A, takes the initi- 
ative as against the other person, B. It adds just 
one each time A takes the initiative and subtracts 
one each time B takes the initiative. A person who 
initiates more often than he has to be initiated to 
earns a plus score whereas a negative score indicates 
the converse. 

7. Dominance: This is a frequency counter which re- 
cords only during an interruption (double action, 
with both keys down). It provides a measure of the 
relative frequency with which one person, A, out- 
talks or out-acts the other person, B, when there has 
been an interruption. The Dominance counter adds 
one if A dominates B in a double action, or sub- 
tracts one if B dominates A. A positive score indi- 
cates that A was more dominant than B in these 
exchanges, and a negative score the converse. 

8. A’s Synchronization: This is a frequency counter 
which records the number of times A has interrupted 
B or failed to respond to B, i.e., the number of times 
A failed to synchronize with B. The earlier described 
counter, A’s Adjustment, records the relative dura- 
tion of these events; the present counter their fre- 
quency. 

9. B’s Synchronization: The other individual’s syn- 
chronization (i.e., the synchronization of the inter- 
viewer, usually). For a description substitute “B” 
for “A” in (8) above. 

10. B’s Uniis: This counter provides a frequency count 
of the actions of the other person (usually the inter- 
viewer). It adds one unit each time the observer 
depresses B’s key during an interactional exchange. 


talking, nodding, gesturing, or in other ways 
communicating (interacting) with the second 
person. Values for these variables are cumula- 
tive and can be abstracted from the printed 
record of the total interview with little dif- 
ficulty. Some of these variables may seem 
unusually arbitrary, since they represent alge- 
braic sums of two variables rather than indi- 
vidual measures of each of these variables. 
Apparently Chapple, in developing his inter- 
action theory of personality, has found these 
derived variables more useful than the first- 
order variables from which they were ob- 


tained. In addition to containing individual 
counters for each variable, the interaction 
chronograph has a “signal” counter which 
functions as a marker to record the start of 
different periods of the interview. 

Chapple (2) and Goldman-Eisler (7, 8, 
9, 10), working independently in England, 
have found that the use of this instrument 
for the objective assessment of personality 
patterns is made more difficult by the differ- 
ences in “interaction patterns” of different 
interviewers. Furthermore, any one inter- 
viewer may use with any given patient only a 


ia 
| 
| 
| 
| 
4 
| 
3 
| 
| | 
| 
q 
| 
| 
| 
| 
‘ 


Stability of Interaction Chronograph Patterns | 419 


small portion of the entire range of interac- 
tional behavior available to him, as shown in 
a recent study of a diagnostic interviewer by 
Saslow et al. (17). Thus, it has been demon- 
strated that the differences in inter- and intra- 


chiatrists observed here; that is, that one of the 
major variables responsible for variation between 
psychiatrists in impressions of patients is the differ- 
ing personalities of the psychiatrists themselves (15, 
p. 732). 


It is little wonder then, for example, that the 


interviewer interaction patterns have a subtle 
but very marked effect on the interviewees’ ‘Science of psychotherapy has made so little a 
“ 
interaction patterns when these are carefully headway when its effects have been “evalu- pene 
and objectively recorded. These experimental ted” by various interviewers (16). pe ll 
results help to define some of the uncontrolled Chapple has developed a standardized inter- “at 
variance which in the past has made the inter- %€ in order to facilitate the process of mak- 
view, including the interviewer’s behavior, a ‘18 the interviewer (and the conduct of Tae 
less than adequate research tool. Experi- the interview) an independent variable, and aa 
menters have attempted to use the interviewer ‘hereby to permit comparison of results in ies & 
as an independent variable in their efforts to “lifferent settings obtained by different inter- if 
measure various interviewee characteristics Viewers, or results obtained by the — ae 
(the dependent variables). But experience has terviewer at various times. Chapple’s stand- ore 
indicated, and the results of Chapple, Gold- ardized method involves certain “rules” for oe |. 
man-Kisler. and Saslow et al. confirm. that ‘thé interviewer to follow in his own inter- 2 
the “interviewer” is not an objective scientific Viewing behavior and in the over-all conduct Bi 
instrument and that. in fact. interviewers are of the interview itself. The standardized inter- aan 
} themssives dependent in the sense of un- View is divided into five periods; with periods ies ea, 
standardized oaiuiiion 1, 3, and 5 as free give-and-take periods, and rap ‘ 
The effect of differences between interview- Period 2 (silence) and 4 (interruption) as ee 
ers is very convincingly demonstrated in a stress phases of the interview. The charac- F 
recent study by Raines and Rohrer (15). teristics of this standardized interview are 
Although using an approach (psychiatrists shown in Table 2, whereas the “rules” govern- 
; filled out rating scales and gave psychiatric ing the interviewer's standardized behavior 
diagnoses following a screening interview) re given in Table 3. ate) wl ; 
which was different from that used by both Thus Chapple’s standardized interview pro- 
Goldman-Eisler and Chapple, Raines and vides a means for eliciting a sample of pa- 
Rohrer conclude: tients’ behaviors (dependent variables) in a 
The results of this study support a “projection” hy- complex but miniature, molar, interpersonal ay 
pothesis to account for the variance between psy- Situation, the characteristics of which to a cer- Et 
Table 2 
Characteristics of the Standardized Interview eee | 
Duration of period — 
Period Type of interviewing Fixed duration Variable duration 
I Free 10 minutes 
II Stress (Silence) 12 failures to ail 
respond or 15 | 
minutes, which- 
ever is shorter 
lil Free 5 minutes ; 
IV Stress (Interruption) 12 interruptions or dei 
15 minutes, which- ra 
ever is shorter ok 
V Free 5 minutes 
Total 20 minutes plus a maximum of 30 
more minutes 
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Table 3 
Standardized Interviewer’s Behavior 


Rules for Interviewer 
Periods 1 to 5 (all periods): 


a. Interviewer introduces each period by a 5-second 
utterance (following his signal to the observer). 

b. All interviewing must be nondirective. No direct 
questions, no probing or depth interviewing. Inter- 
viewer can reflect, ask for clarification, ask for more 
information, introduce a new topic area, etc. In 
general, interviewer’s comments should be nonchal- 
lenging and open-ended and related to the patient’s 
past comments or to some new, general topic. 

c. All interactions must be verbal only, or verbal and 
gestural at the same time; i.e., interviewer cannot 
use head nods and other gestures alone. This rule 
simplifies the observer’s task. 

d. All of interviewer’s utterances must be of approxi- 
mately 5-seconds duration. 

e. After patient finishes a comment or other inter- 
action, interviewer must respond in less than / 
second, except as otherwise noted in Period 2. 

f. Each time patient interrupts interviewer, the latter 
must continue to talk for 2 more seconds. This rule 
insures more explicit definition of a patient’s as- 
cendance-submission pattern than would be possible 
if interviewer “submitted” immediately. 


Periods 1, 3, and 5: 


a. Interviewer must never interrupt patient. 
b. If after interviewer makes a comment patient does 
not respond, interviewer must wait 15 seconds and 


then speak again for 5 seconds. 
Period 2 only: 

a. Interviewer must “fail to respond” to last inter- 
action of patient a total of 12 times (or for 15 
minutes, whichever is shorter). 

b. After interviewer has been silent for 15 seconds 
(and patient has not taken initiative) interviewer 
makes another 5-second comment. 

Period 4 only: 

a. Each time patient acts, interviewer must interrupt 
patient for 5 seconds for a total of 12 times. 

b. Interviewer’s interruption should begin about 3 
seconds after patient has begun his interaction. 

c. After having interrupted patient, if the patient 
continues through the interruption (does not sub- 
mit), interviewer will not interrupt again until 
patient has finished his utterance, i.e., interviewer 
will interrupt patient only once during each utter- 
ance of the latter if patient does not “yield.” 

d. The period is ended after 12 interruptions or 15 
minutes of attempting to obtain these. 


tain degree are objective and predefined (in- 
dependent variables). Such controlled “field” 
situations would seem to hold promise of 
improved assessment of personality. 


The aim of the present study was to in- 
vestigate the consistency of scorable interac- 
tion patterns elicited from patients inter- 
viewed independently by two different inter- 
viewers in a standardized manner. Thus the 
question was asked, “Will patients behave in 
the same way in independent interviews with 
two interviewers, if the latter employ similar 
interviews and interviewing behaviors?” Put 
differently, the question was asked: Is it pos- 
sible to obtain inter-interviewer reliabilities 
of patient interaction patterns of magnitudes 
higher than those obtained by Raines and 
Rohrer (and numerous other investigators) 
when the behavior patterns of the interviewers 
are more controlled and less subject to indi- 
vidual, uncontrolled variation? It was recog- 
nized at the outset that the problem of the 
reliability of the measuring instrument was 
confounded with the question of the variance 
or invariance of the interaction patterns being 
measured by this instrument. It was reasoned 
that high reliability cofficients on the various 
variables would tend to indicate both stability 
of interaction patterns of patient-subjects and 
also the reliability of the instrument of meas- 
urement; whereas low or insignificant correla- 
tions would not permit one to decide whether 
the unreliability was in one of these factors, or 
the other, or both. 


Procedure 


After they practiced the interview technique 
described in Tables 2 and 3 in some 40 inter- 
views each, the objective interaction scores 
of the interviewers indicated that they were 
approaching the prescribed patterns with a 
high degree of accuracy. This was achieved 
without the interviewers feeling unduly in- 
hibited in their responses to the patients. At 
this point the present study was initiated. 
The design called for twenty patients, each to 
be interviewed independently by the two in- 
terviewers, one of whom was a young in- 
ternist experienced in interviewing, and the 
other an older psychiatrist. The Ss were out- 
patients sent to the psychiatric clinic of a 
large Midwestern clinic for screening and as- 
signment to individual psychotherapists. They 
were new patients, white, 11 men and 9 
women, ranging in age from approximately 
18 to 55. The presenting problems were typi- 
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cal of the population of this outpatient clinic, 
and consisted of cases of anxiety reaction, 
hysteria, depression, schizophrenia, obsessive- 
compulsive neurosis, duodenal ulcer, and pos- 
sible chronic brain syndrome. 

The design was randomized to control order 
effects and each interviewer thus interviewed 
every other patient first; an AB order with 
the first patient and a BA order with the 
second, etc. In this way each interviewer 
came first in the sequence for ten patients 
and second for the remaining ten. Statistical 
evaluation of these “interviewer-order”’ effects 
on patients’ behavior was therefore possible. 

Ash trays were removed from the inter- 
viewer’s desk so that patient smoking would 
be discouraged. No mention was made of the 
experiment by the interviewer, and the experi- 
ment began when the interviewer opened the 
interview with such a statment as “My name 
is Dr. . Can you tell me how you hap- 
pened to come to the clinic at this time?” 
The interviewer pressed a concealed button 
for a light signal to the observer. The ob- 
server was seated in the next room on the 
other side of a large one-way window (5 ft. x 
2 ft.) and recorded the interaction by pressing 
designated buttons for the interviewer and the 
patient. A high-fidelity microphone was hung 
from the ceiling and connected to an amplifier 
in the observer’s room. The side effects of the 
recorded voices in the observer’s room were 
controlled by piping the voices to the observer 
through earphones. Throughout the experi- 
ment, sound recording and visual observation 
of the interactions were excellent. Verbatim 
recording of the interview exchanges was made 
by means of recently devised highly sensitive 
microphones, connected directly to an Audo- 


graph machine in the observer’s room. The 
Audograph discs were transcribed for sub- 
sequent studies (content analyses, etc.). 

The second interviewer was always in an- 
other building when the first interviewer was 
conducting his interview, thereby insuring in- 
dependence; both began with zero knowledge 
about the patient. After finishing his inter- 
view, the first interviewer would say, “Mr(s). 
——., I’d like another doctor in our clinic to 
talk with you now, and if you will wait here 
a minute or two I will go to get him.” Upon 
his arrival, the second interviewer signalled 
the observer and the second half of the ex- 
periment began. 

Subjects were used in the order in which 
they were sent to this particular clinic. How- 
ever, due to power failure in the hospital on 
one occasion, and to total incoherence on the 
part of one patient, two of the consecutive 
planned experiments were lost. 


Results 


From Table 2 it is seen that twenty min- 
utes of the interview are fixed and that, de- 
pending upon the subject’s behavior in periods 
2 and 4, thirty more minutes are possible to 
complete the standardized interview. Table 
4 contains an analysis of the actual mean 
length of the twenty pairs of interviews. It 
can be seen that the standardized interview 
takes, on the average, about 33 minutes to 
conduct, and that both doctors had essentially 
equal interview times. Furthermore, the order 
of the interview, comparing all 20 first inter- 
views or all 20 second interviews—independent 
of which doctor was first or second—yielded 
no differences in mean length of interview 


Table 4 
Analysis of Mean Length of Total Interview for 20 Patients Interviewed Twice 


Analysis 


Length of all 40 interviews 

a. Length of all first 
interviews (NV = 20) 

b. Length of all second 
interviews (V = 20) 

c. Length of all first doctor’s 
interviews (N = 20) 

d. Length of all second doctor’s 
interviews (V = 20) 


Mean Range 
32.8 minutes 25.7 to 50.3 
32.9 minutes 26.9 to 41.4 
32.8 minutes 25.7 to 50.3 
33.5 minutes 26.9 to 50.3 
32.2 minutes 25.7 to 41.2 
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Table 5 


Means, Standard Deviations, Ranges and Coefficients of Correlation Across Total 
Interview for Each of the Major I-C Variables 


Variable Dr. 1 Dr. 2 r ? 

1. Mean Pt.’s Units 72.20 69.85 747 01 
SD 25.55 20.90 
Range 25 to 127 29 to 112 

2. Mean Pt.’s Action* 59.85 56.25 956 01 
SD 56.90 54.60 
Ranget 13 to 243 17 to 257 

3. Mean Pt.’s Silence 9 20 8.30 .759 01 
SD 3.11 2.41 
Ranget 5 to 17 3 to 14 

4. Mean Tempo 57.20 52.85 .908 01 
SD 36.87 27.26 
Ranget 24 to 159 26 to 142 

5. Mean Activity 39.55 36.20 .930 01 
SD 40.49 30.01 
Rangef —2 to 150 2 to 131 

6. Mean Pt.’s Adjustment —1.53 —1.28 802 01 
SD 1.33 1.17 
Range 0 to —5 Oto —4 

7. Mean Dr.’s Adjustment —1.83 —1.53 .737 01 
SD 1.17 80 
Range —3.57 to +1.00 —3.03 to +.14 

8. Mean Pt.’s Synchron. .84 84 01 
SD 07 .06 
Range .63 to .99 .76 to .98 

9. Mean Dr.’s Units 62.79 59.74 772 01 
SD 22.78 18.28 
Range 21 to 109 27 to 93 


* Values for variables 2 through 7 are given in hundredths of a minute. To convert to seconds, multiply the given value by 0.6. 


t Rounded to nearest whole number. 


when subjected to statistical analysis. It is of 
interest to remember that, relatively speaking, 
no restrictions were placed on the lengths of 
two of the periods (i.e., patients could vary 
in their behavior in these periods such that 
the lengths of these periods could range from, 
say, one minute to fifteen minutes). Table 4 
indicates that even with this freedom to vary, 
the patients behaved similarly in regard to 
this variable with the two interviewers. The 
mean lengths in minutes for the two doctors, 
respectively, were 8.41 and 8.44 in period 2 
(r = .631), and 3.68 and 2.26 in period 4 
(r = .555). These differences were not sig- 
nificant by F test, while the values of r were 
significant at the .01 level of confidence. These 
first observations thus point to the existence 
of stability in the interaction patterns of pa- 
tients in a standardized interview. 

Table 5 presents, for the 20 patients, the 
means, standard deviations, and ranges (for 


the total interview) of nine of the interaction 
chronograph variables. Values are given for 
each doctor, with Pearson coefficients of cor- 
relation shown in the second column from 
the last. The results show a striking stability 
in interaction variables from first interview 
to second interview, for each of the twenty 
patients. 

The stability in patients’ interaction pat- 
terns (and corollary reliability of measure- 
ment by the interaction chronograph) is dem- 
onstrated in Table 5 in several ways: (a) 
relative reliability between the behaviors 
manifested by each of the 20 patients with 
the two doctors, and () absolute reliability 
of these same interaction patterns. 

Relative reliability is inferred from the 
values of the Pearson product-moment coef- 
ficients of correlation shown in the second 
column from the last. These coefficients range 
from .726 to .956 and are all significant at the 
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.01 level of probability. They indicate that, 
for the interview as a whole (summing all five 
periods), each of these 9 interaction chrono- 
graph variables has a marked stability, i.e., 
that two interviewers will elicit these pat- 
terns from each of the 20 patients in amounts 
which are relatively similar for any one of 
these individuals. 

However, since one can get high values of 
Pearson r for any one variable when 2 inter- 
viewers elicit different amounts of this varia- 
ble from the same patient, providing the rela- 
tive values of these amounts are maintained 
from patient to patient,* these high values for 
r are in themselves not sufficient to demon- 
strate reliability in the sense of stability (or 
invariance) in patient characteristics. 

To demonstrate the latter, one needs to 
examine the values for the means, SD’s, and 
ranges for any one variable across the two 
interviewers. Examination of Table 5 indi- 
cates an equally striking absolute reliability 
for these 9 variables for the 20 patients with 
the two interviewers. Thus, for example, de- 
spite very marked individual differences in 
the interaction rate of the 20 patients (pt.’s 
units), as seen by the fact that patients varied 
from 25 to 127 units with Doctor 1, and 29 
to 112 with Doctor 2, the average number of 
units for these 20 patients was 72 with Doctor 
1 and almost 70 with Doctor 2. That is, 
despite the fact that, during an interview of 
33 minutes’ average length, one patient inter- 
acted 25 times (less than once a minute) and 
another patient 127 times (4 times per min- 
ute) with Doctor 1, and similarly with Doc- 
tor 2, any single individual seemed to main- 
tain his own interaction rate independent of 
which doctor was interviewing him. Thus one 
individual interacted 25 times with the first 
doctor and 29 times with the second, while 
another subject interacted 112 and 129 times, 
respectively, with the two interviewers. This 
marked similarity in values of the means, as 
well as the equally striking similarity in stand- 


#One could get a perfect correlation (1.00) on 
pt.’s units, for example, if the second interviewer 
always got double or some other multiple of the 
number of units from the same patients; ie., if from 
3 pts. one doctor got 30, 40, and 50 units, and the 
second obtained 60, 80, and 100, the value of r 
would be 1.00. 


ard deviations and ranges for all the variables, 
would tend to indicate, when added to the 
high values of the Pearson r’s, that these 
interaction variables reflect stable and in- 
variant personality characteristics under these 
standardized conditions in the subjects studied. 
Before leaving Table 5 several other findings 
should be pointed out. The first is the finding 
that how active (“action”) a patient is on the 
average, is a more stable or invariant charac- 
teristic from interviewer to interviewer (r = 
.956) than is any other variable measured. 
Average length of silence, on the other hand, 
while fairly stable is less so (r = .759). Tempo 
and activity, which are algebraic summations 
of action and silence, are very stable (.908 
and .930) and yield values similar to those 
reported by Chapple on 12 subjects (2). 
Chapple gave no reliability figures for the 
other variables. 

It is of interest to note also that the two 
interviewers interacted with the patients in 
similar ways. From Tables 2 and 3 it can be 
seen that no attempt is made to standardize 
how often the interviewer should interact with 
the patient. Except for somewhat defining his 
number of units in periods 2 and 4 (thus con- 
trolling his behavior in 12.8 minutes of the 
total 32.8 minutes) he is free to follow the 
patient’s initiative in regard to this variable 
during the three remaining free periods (20 
minutes). Despite this freedom the two doc- 
tors had similar average numbers of Dr.’s 
units of interaction (63 and 60, with an r of 
.772) and similar “adjustments” (algebraic 
sum of durations of interruptions and failures 
to respond). 

Table 5 contains values for only 9 of the 10 
variables defined in Table 1. Values for Dr.’s 
synchronization were not included because in- 
spection revealed a very limited range in the 
scores for this variable—at or very near 1.00 
throughout—making statistical analysis of 
little usefulness. The value for r for Dr.’s syn- 
chronization was —.058, while the values for 
the means, SD’s, and ranges were .98, .029, 
and .94 to 1.04 for Doctor 1; and .96, .050, 
and .84 to 1.02 for Doctor 2, respectively. 
The values for this variable are an internal 
check, and indicate that the two interviewers 
did in fact follow the standardized rules very 
closely and, except in periods 2 and 4, which 
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Table 6 


Coefficients of Correlation for Two Interviewers for Each of the Major I-C 
Variables in Each of the Individual Periods of the Interview 


-Pt.’s Pt.’s Pt.’s 


Pt.’s Dr.’s Pt.’s 
adjust- adjust- synchro- Dr. 


Period units action silence Tempo Activity ment ment nization units 
I .726 685 845 665 593 .809 476 .203 715 

554 817 899 802 832 136 864 836 .784 
Il 365 485 121 496 A71 334 212 024 436 
IV 553 849 435 609 811 696 625 591 
517 .735 A479 .736 735 .820 771 .006 432 
Total 747 956 .759 .908 .930 802 737 .726 772 


Note.—Forr = .444, p < .05; forr = .561, p < .01. 


will be discussed later, they neither inter- 
rupted nor failed to respond, therefore earn- 
ing scores near 1.00 on this variable during 
the various periods. Due to the nature of the 
construction of the present form of the inter- 
action chronograph, the very brief pauses of 
the doctor which occur in give-and-take in- 
teraction before he responds to the patient are 
recorded as if they were failures to respond. 
Thus an interviewer who adequately follows 
the standardized interview behavior will score 
one failure to respond for each of his units, 
and thus will earn a score of 1.00 instead of 
0.00 on synchronization. 

A word regarding the shape of the distribu- 
tion of scores for each of these variables is in 
order. All variables yielded normal, or bell- 
shaped distributions, except action and silence 
(and their derivatives—tempo and activity). 
These latter yielded J-shaped distributions 
with skewing toward the high end. The recent 
findings of Norton (quoted in Lindquist) 
would indicate that little precision was lost in 
not normalizing the individual scores prior to 
statistical analysis (13, pp. 78-86). If nor- 
malizing is felt to be necessary the various 
transformation methods used by Goldman- 
Eisler with similar data are applicable (7, 8). 

The remarkably stable results shown in 
Table 5 represent values determined on the 
basis of the total interview. However, in view 
of the fact that the interview (and inter- 
viewer behavior) described here is standard- 
ized for each of its five subparts, it is of in- 
terest to examine the consistency or stability 
of patient interactional behavior in each of 
these separate periods. Coefficients of cor- 


relation are given in Table 6, while means 
and standard deviations are shown in Table 7, 
for the analysis by periods. 

As one would expect, and statistical theory 
can describe (the Spearman-Brown phenome- 
non), individual samples (periods) are in gen- 
eral less reliable than the total sample (over- 
all interview). Thus patients’ interactional 
behaviors during the 10-minute first period, 
or 5-minute fifth period, for example, are less 
reliable from interviewer to interviewer than 
are the values based on the total 32.8 minute 
interview.® Values for the total interview are 
reproduced from Table 5 for comparison. 

There are several striking findings shown 
in Table 6. Of the 45 correlations shown for 
the 9 variables in the 5 periods, 27 are sig- 
nificant at the .01 level and 10 reached the 
05 level of statistical significance. Thus, only 
8 period-variable combinations failed to reach 
statistical reliability. Three of these 8 are 
contributed by the patient’s synchronization 
variable—an artifact which results from the 
very limited range of scores found for this 
variable and described earlier for doctor’s 
synchronization. 

Reading across variables for each of the 5 
periods, one is struck by the consistently low 
reliabilities found in period 3. Thus, while 
periods 1, 2, 4, and 5 have, respectively, 7, 
8, 7, and 5 r’s which reach the .01 level and 


5 An example analogous to the situation here de- 
scribed is the low test-retest reliability of some of 
the Wechsler-Bellevue subtests (.60), while the re- 
liability of the Full Scale IQ (derived by a weighted 
adding of these subparts) is remarkably high, with an 
r of .97 (19, p. 13). 
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Table 7 
Means and Standard Deviations for Two Interviewers for Each of the Major I-C 
Variables in Each Period 
Pt.’s Pt.’s Pt.’s Pt.’s 
units action silence tempo 
Period Dr.1 Dr. 2 Dr.1 Dr.2 Dr.1 Dr, 2 Deed 
IM 19.55 18.35 7445 65.25 740 8.65 82.25 74.30 
SD 10.10 9.20 68.02 45.92 2.44 2.50 66.90 44.30 
II M 12.85 12.35 60.90 69.40 12.70 11.55 73.90 81.10 Se": 
SD 3.02 2.24 59.40 81.60 8.24 6.33 56.57 79.56 ‘ae 
lil M 9.35 10.75 67.75 79.25 10.70 9.30 78.80 88.90 
SD 5.05 5.70 44.70 127.76 4.52 2.72 45.35 127.80 “i 
IV M 17.55 15.20 21.00 8.85 645 5.380 27.75 15.10 F 
SD 6.22 3.54 49.74 3.47 3.10 1.78 49.55 3.75 : 
VM 12.90 13.20 75.90 59.30 8.40 7.40 85.50 67.05 
SD 7.14 6.10 124.12 67.82 2.60 1.80 122.82 66.71 4 
Total M 72.20 69.85 59.85 56.25 9.20 8.30 57.20 52.85 oe” 
SD 25.55 20.90 56.90 54.60 3.11 2.41 36.87 27.26 ae 
Pt.’s 4 
Pt.’s Pt.’s Synchro- Dr.’s Dr.’s . 
Activity Adjustment nization Adjustment Units ee. 
Period Dr.1 Dr. 2 Dr.1 Dr.2 Dr 1. Dr. 2 Dr.1 Dr.2 Dr.1 Dr. 2 
I M 67.15 56.85 —117 —1.81 94 96 — 57 — A7 19.05 17.79 
SD 69.35 47.55 1.28 1.47 07 05 .26 .20 9.67 8.73 
Ii M 48.30 57.94 —1.11 — .99 wae —9.90 —8.95 3.21 3.10 
SD 63.10 84.00 2.13 1.34 18 15 5.54 4.12 
lil M 57.20 69.95 —2.16 —1.54 1.02 1.00 —1.06 — .74 9.68 10.63 } 
SD 44.65 127.75 2.31 1.86 04 .08 1.13 59 5.19 5.56 es 
IV M 14.50 3.10 —-1.53 — 99 98 .99 91 1.28 17.84 15.21 E. 
SD 50.25 4.45 1.80 2.41 07 1.08 .70 5.30 2.04 
VM 67.65 51.85 -1.50 —. 95 96 95 — 81 — .61 12.74 13.00 
SD 125.10 68.75 1.09 1.20 06 = .08 54 27 6.28 5.03 4 
Total M 39.55 36.20 —1.53 —1.28 84 84 —1.83 —1.53 62.79 59.74 
SD 40.49 30.01 1.33 1.17 07 =.06 1.17 80 22.78 18.28 a 
1, 0, 2, and 3 which reach the .05 level, pe- invariant from doctor to doctor under all ae As 
riod 3 has mo values at the .01 level and only conditions explored other than that involv- 
4 at the .05 level. ing silence stress. s 
Since period 3 is a free give-and-take pe- The reason for this finding is not apparent e 
riod similar to periods 1 and 5, the explana- at present. It may reflect, among other things, ee 
tion for this discrepancy must be sought else- a true patient-instability in this regard rela- F: Pog 
where than in the structure of the period it- tive to other patient characteristics, or it as 
self. A tentative hypothesis is that the stress may reflect actual differences in doctors’ 
of period 2 (silence) is such as to make un- silence behavior in period 2. Slight evidence - 
predictable from interviewer to interviewer for the latter was demonstrated by an analy- : 
the immediately following behavior of the sis of the effect of “order” of interviewer 
same patient. Put differently, a group of 20 upon patient interaction patterns. By use of VE: 
patients will react to silence stress (as seen a covariance design utilizing two conditions a 
in their immediately following behavior, pe- (an AB order for 10 Ss and a BA order for ‘? 
riod 3) differently, depending upon the doc-_ the other 10 Ss), values for the over-all inter- : 
tor who is causing the stress. A still further view for each of the variables were compared. ¥ 
description of this same phenomenon is: in- Thus it was possible to determine, for exam- 
teraction patterns of patients seem relatively ple, if Dr. 1 elicited more units (etc.) from 
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the same patients relative to Dr. 2, depend- 
ing upon whether he, Dr. 1, interviewed these 
patients first or second. Despite individual 
differences in the means for these variables 
for the 10 patients interviewed under condi- 
tion I, relative to the values for the 10 dif- 
ferent patients interviewed under condition 
II, there were no differences in the means 
(M, versus Me, and Mz versus M,) between 
the two doctors, for eight of the nine vari- 
ables. Thus, for these variables, it made no 
difference which doctor interviewed first or 
which interviewed second. 

This was not the case with the patient’s 
“average duration of silence” variable, how- 
ever. For this variable it was shown that in 
the second series of 10 patients, Doctor 2 
coming first had the significant effect (F = 
4.66, p= .05) of eliciting more silence in 
these 10 patients when they were seen im- 
mediately following this experience by Doc- 
tor 1. On the other hand, Doctor 1 did not 
produce this effect on the 10 patients he saw 
first as reflected in their “silence” behavior 
pattern with Doctor 2, who saw them im- 
mediately afterward. The values of M,; and 
Mp2 were 7.9 and 9.1, while the values for Ms 
and Mg, were 9.3 and 8.7. Thus Doctor 2 had 
the effect of changing the patients’ average 
duration of silence with Doctor 1 from 7.9 to 
8.7 when he, Doctor 2, came first, whereas 
the silence of these patients with him was the 
same whether he came first (9.3) or second 
(9.1). If this is the variable which accounts 
for the marked effect (instability) shown in 
period 3 and discussed earlier, it is certainly 
a very subtle factor and could only be dis- 
covered by an instrument of measurement as 
sensitive as the interaction chronograph. Fur- 
ther experiments with the silence variable are 
therefore indicated. 

It is of interest to note that spontaneous 
comments from three “normal” subjects in- 
terviewed by the standard interview method 
indicated that they found the “silence treat- 
ment” more stressful than the “interruptions.” 
These three subjects, like a dozen or so psy- 
chologists and psychiatrists who have ob- 
served one or more of these interviews 
through a one-way screen, were relatively 
unaware of the interruption stress, i.e., they 
did not perceive the change in interviewer 


behavior in period 4. The silence stress was 
apparent, however. Possibly there are cul- 
tural factors involved here. Each of us has a 
long history of reinforcement in regard to 
being interrupted by others. However, in our 
culture, it is very unusual for anyone to fail 
to respond to us. Thus, silence stress is an 
unusual experience and is therefore possibly 
more disrupting than interruption. This specu- 
lation would appear to have some relevance 
in view of Hebb’s theoretical analysis of the 
search by various organisms for an optimal 
level of excitement or stimulation from their 
environment (11, pp. 551-552). 

Another point worth noting in Table 6 is 
the r of .136 for patient’s adjustment in pe- 
riod 2. Because of the nature of period 2 and 
the method of measurement, this period-vari- 
able does not reflect the similarity of the 
patient’s adjustment with the two doctors. 
Rather it reflects the correlation between the 
behaviors of the two doctors. This counter 
records the “durations of the patient’s inter- 
ruptions” and “durations of the patient’s fail- 
ures to respond to the doctor’s last utter- 
ance.” However, in period 2 the doctor’s 
standardized behavior involves not respond- 
ing to the patient in an effort to assess the 
latter’s level of initiative. Therefore the pa- 
tient’s adjustment counter can record few if 
any failures of the patient to respond to the 
doctor and still fewer interruptions of him 
since he is not responding. The r of .136 is 
an artifact and reflects the very restricted 
range derived from the relatively few times 
the two doctors restimulated the patients dur- 
ing this period with the subsequent failure to 
respond on the part of the patients. 

A truer reflection of the reliability across 
two doctors for the patients’ adjustment un- 
der the stress of silence will be found in the 
Dr.’s Adjustment column in period 2. This 
counter, recording how long each doctor 
failed to respond to the patient before the 
latter took the initiative, thereby stopping 
this counter, yields a clearer picture of the 
stability of a patient’s adjustment to silence. 
The r of .864 indicates that patients were re- 
markably similar in their adjustment to the 
interviewer’s silence in the two interviews. 

Finally a word should be said about the 
relatively higher r’s found in period 2 (most 
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of them in the .80’s) as compared to the r’s 
of period 4 (with five r’s below .60). Why 
should patients show more invariance during 
silence than during interruption? (Note this 
is not incompatible with their subsequent re- 
flection of this “greater” silence stress in pe- 
riod 3 as discussed earlier.) The answer is not 
found in the present data and therefore more 
study is required. However, from a purely 
empirical point of view it can be pointed out 
that period 2 is a much “slower” period, con- 
taining on the average little two-way inter- 
action, as such. Accordingly it is an easy pe- 
riod for the observer to record. Thus one 
would expect high reliabilities. Period 4, on 
the other hand, typically involves a “fast 
pace” with both patient and doctor talking at 
the same time. The observer has his most 
difficult time during this period. Hence, a 
possible explanation for the lower values of 
r. In order to check this point, as well as the 
reliability of observation during the total in- 
terview, we have designed a study in which 
single interviews will be recorded simultane- 
ously and independently by two observers 
using two interaction chronographs. 

Chapple has stated that the behavior of 
patients during the two stress periods yields 
important information regarding individual 
personality make-up. Accordingly, he has pro- 
vided two measures of this behavior. The first 
is the Jnitiative shown by patients during the 
interviewer’s silences of period 2, while the 
second is the Dominance pattern of the pa- 
tient during the interruption stress of pe- 
riod 4. 

Initiative is defined as the frequency with 
which a patient can respond or interact again 
when his partner fails to respond to him. It 


is measured in the interaction chronograph 
by a frequency counter which adds one each 
time the subject initiates an action follow- 
ing a period during which both people are 
silent and subtracts one each time the in- 
terviewer starts an action (because the sub- 
ject failed to do so for 15 seconds). This fig- 
ure will be negative if the interviewer is 
forced to start more actions than the subject, 
and positive if the subject shows more initia- 
tive by starting more actions than the inter- 
viewer. In order to control for differences in 
number of units during this period, the net 
count described above is divided by the total 
number of units during period 2. 

Dominance is a measure of an individual’s 
relative ability to “talk others down” in 
double actions. It is a reflection of one’s 
ascendance-submission pattern. Measurement 
of this variable is made only during the fourth 
(interruption) period. The obtained score is 
the difference in the number of times one 
person out-talks or out-acts (dominates) the 
other when both have been talking or acting 
at the same time. If the subject dominates 
the interviewer the greater number of times, 
the score is positive; if he submits more fre- 
quently to the interviewer’s interruptions, the 
score is negative. 

Results for these two variables are shown 
in Table 8. Here one finds a striking stability 
in the initiative variable (r= .805) and a 
lower but still significant reliability (r= 
470) for the dominance pattern. The earlier 
comments regarding possible observer error 
in period 4 are again pertinent here and stress 
the need for the mentioned observer-reliability 
study. The mean values for initiative and 
dominance in Table 8 indicate that patients 


Table 8 


Means, Standard Deviations, Ranges, and Coefficients of Correlation for Two Interviewers 
for Initiative (Period IT) and Dominance (Period IV) 


Variable Dr. 1 Dr. 2 r 
1. Mean initiative (Period I) 75 77 805 
SD 18 19 
Range 17 to .94 .23 to 1.00 
2. Mean Dominance (Period IV) —.32 —.42 470 
SD 31 30 
Range —.71 to +.83 —.92 to +.29 
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show a fairly high rate of initiative under 
silence and tend, on the average, to submit 
more than they dominate when interrupted 
by an interviewer. There are, however, very 
marked individual differences in these inter- 
action patterns, as can be seen in the SD’s 
as well as in the ranges of variation. 

Further effects of the silence and interrup- 
tion stress can be seen in Table 7. Here it is 
shown that, relative to their own values in 
the other four periods, patients have a longer 
length of “silence” in period 2 and a consid- 
erably shorter length of “action” in period 4. 
The longer average patient silence in period 
2 is of interest since it indicates that patients’ 
interaction patterns, while relatively invari- 
ant on any one day from interviewer to in- 
terviewer, when these latter are following a 
standardized pattern, are nevertheless sensi- 
tive to changes in the stimulus situation 
(intra-interviewer’s behavior). Thus in period 
2, when the interviewer deviated from his 
period 1 pattern of responding immediately 
at the end of the patient’s utterance (from 
silences of less than 1 second, to 15 seconds), 
the effect was to produce a concurrent in- 
crease in the patient’s own duration of silence 
(from 7.40 to 12.70). It was subtle observa- 
tions such as these which led Chapple to 
standardize the interviewer’s behavior. 

At this point a word might be said about 
the implication of this latter finding in regard 
to the question of stability or instability of 
interaction patterns. The results discussed 
earlier indicate that when the behavior of in- 
terviewers is standardized to some degree one 
gets highly consistent patterns from subjects. 
However, this invariance is for comparable 
samples of inter-interviewer behaviors and 
not for intra-interviewer changes. Thus, 
changes in the interviewer’s behavior from 
periods 1 to 2 are associated with changes 
in interviewee reaction patterns, with the 
new patterns having generally high reliabil- 
ity. It would appear, therefore, that there 
is invariance (reliability) for comparable 
stimulus conditions and lack of invariance 
(changeability) for changing stimulus condi- 
tions. Without the latter one would be faced 
with an inflexibility of interaction behavior 
which would make the study (and eventual 


predictability) of such patterns of limited 
usefulness in psychology. 

The question can be raised here as to 
whether the obtained stability in interaction 
patterns from interviewer to interviewer for 
any one interviewee was due to the latter’s 
repeating the “same story” to the second in- 
terviewer, or some similar artifact. This was 
probably not so since verbatim records of the 
interviews show that there were often major 
differences in content between the first and 
second interviews. Military experiences might 
be a prominent feature of the first interview, 
yet hardly alluded to in the second; resented 
desertion by a spouse might dominate one or 
more subsections of the first interview, and 
not be mentioned in the second, etc. The rec- 
ords show also that psychotherapeutic changes 
could occur in the first interview and be re- 
ported in the second, prefaced by a clearly 
identifying remark such as, “Until I talked 
to the other doctor today, I didn’t realize 
that . . .” etc. Some patients were obviously 
less emotional in the second interview than in 
the first. These differences between the two 
consecutive interviews in content, attitude, 
insight, emotionality, etc., will be examined 
in detail by a variety of methods of analysis 
of the verbatim materials. However, in spite 
of these clinically obvious differences in pa- 
tients in the two interviews, there was still a 
marked stability in interaction patterns for 
any one patient. 

Some postinterview comments made either 
to the interviewers or to doctors in other 
clinics, such as “I liked you more than the 
other doctor—I didn’t like him at all,” “I 
was upset by those doctors—I revealed things 
and said things I hadn’t thought of for years,” 
etc., emphasize that (a) the standardized in- 
terview is compatible with certain patient re- 
actions common to any initial psychiatric in- 
terview, (5) the interactional pattern of the 
interviewer is not mentioned by the patient, 
who is presumably not fully aware of it, 
and (c) the possibility exists of utilizing the 
standardized interview as a systematic ob- 
servational device that can be interpolated a 
number of times at different points in a pro- 
longed psychotherapeutic or other interac- 
tional relationship, without “accommodation” 
or practice effects. 
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Discussion 


The results of this study would tend to in- 
dicate that considerable stability of patient in- 
teraction patterns is demonstrable if there is 
some control of the stimulus situation. Con- 
trol of the stimulus has been overlooked all 
too often in psychological and psychiatric re- 
search with the interview. Our experience has 
indicated that the interviewer may be a suffi- 
ciently controlled or independent variable 
without at the same time introducing un- 
natural or stereotyped interviewing behavior. 
Neither the Ss themselves nor uninitiated, 
though experienced, professional interviewers 
who observed the interview indicated by their 
comments that the standardized interview, 
when conducted by an experienced inter- 
viewer, is very different from other psychi- 
atric interviews. 

However, the demonstration of reliability 
under the conditions here investigated does 
not exhaust the questions concerning reli- 
ability. The obtained reliability indicates a 
stability in patient personality patterns with 
two interviewers on the same day. We are 
currently conducting a reliability study in 
which a single interviewer interviews the 
same patient on two occasions one week 
apart. Thus the completed study reported 
here involved same-day reliability across two 
interviewers, while the study now under way 
is a test-retest reliability study over time, 
with a single interviewer. Finally, since the 
high reliabilities reported here for our first 
experiment are unusual in psychological re- 
search involving complex variables such as 
one finds in the interview, we have felt it 
necessary to cross validate our findings and 
are currently doing this in a replication uti- 
lizing an identical design. 

There are several other aspects of the re- 
liability problem which should be pointed out. 
The first of these is the reliability of the ob- 
server’s recording of the interview. He is ah 
integral part of the instrument and the final 
results will be no more reliable than the re- 
liability or accuracy of his input. We plan 
soon to check this aspect of the reliability 
problem by having two observers record the 
same interview, independently, across a series 
of patients. 


Also to be examined is the reliability of the 
scorer who abstracts the objective scores 
from the numbers printed by the counters. 
The interaction chronograph yields cumula- 
tive scores on the several variables and thus 
scoring, which involves primarily simple arith- 
metic skills, is a reasonably objective pro- 
cedure. For the study here described a second 
person followed Chapple’s manual of instruc- 
tions and scored independently the interaction 
chronograph records of ten of the interviews 
selected at random. The results indicated per- 
fect agreement between two scorers on 96 per 
cent of the 600 individual final scores in- 
volved. The magnitudes of the errors involved 
in the remaining 4 per cent were very mini- 
mal; they were of the order of 1 unit in 
whole number scores and .06 unit in those 
variables involving hundredths of a _ unit. 
Thus it would appear that scorer reliability 
presents no problem in these observations. 

A question which is of considerable theo- 
retical as well as practical interest is the re- 
lation between content of verbalizations and 
the interaction patterns of individuals during 
the interview. Chapple has taken the position 
that these patterns are relatively invariant 
and are negligibly related to the topics one 
is discussing (4, p. 33). We plan to investi- 
gate this interesting question in subsequent 
research. We plan also to examine the pos- 
sibility of differences in interaction patterns 
in various diagnostic groups. There is reason 
to believe that schizophrenic patients may 
have patterns different from those of hys- 
terics, or of manic-depressive patients, etc. 
So far, only a few patients have been ex- 
amined with regard to the possibility that in- 
teraction chronograph variables may reliably 
discriminate one category from another (5). 
The interaction chronograph may permit a 
refinement in differential diagnosis by its po- 
tential for objective or operational definitions 
of patient characteristics. 

Finally, the question of validity of the in- 
teraction chronograph scores must be consid- 
ered. Establishing the reliability of an instru- 
ment is only the first step. Consideration of 
the meaning of the obtained scores is, of 
course, a necessary second step. The ques- 
tion of validity is a complex one and will be 
approached in a number of different ways 
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(predictive, content, concurrent, and con- 
struct validity). The study described above 
involving different patient groups is one of 
these approaches. However, there are various 
others. For example, one can ask, What are 
the behavior correlates in real-life situations 
of these measured interaction chronograph 
variables? Other approaches might involve 
use of the method in assessing the effects of 
psychotherapy in time; the immediate test- 
retest changes in interaction patterns follow- 
ing the administration of controlled doses of 
drugs which induce affective changes that can 
be reliably communicated by verbal report; 
the influence of status and role of the inter- 
acting participants upon the measurements, 
etc. Finally, an approach which may have 
considerable importance is the possibility of 
developing techniques for planned behavior 
therapy; i.e., will having the interviewer in- 
crease the length of his utterances with de- 
pressed patients increase the duration of ut- 
terances of the latter?, etc. 


Summary 


The present study was concerned with two 
problems: an investigation of the reliability 


of the interaction chronograph as an instru- 
ment of research and the corollary problem 
of the variance or invariance of interaction 
patterns of patients during a standardized 


psychiatric interview. Twenty outpatients 
were interviewed independently by two dif- 
ferent interviewers under previously defined 
and practiced conditions. The highly signifi- 
cant correlations obtained indicate reliability 
of the instrument and a marked invariance or 
stability in patient interaction patterns when 
the stimulus conditions are relatively stand- 
ardized (inter-interviewer consistency) and a 
flexibility in these interaction patterns when 
these stimulus conditions are changed in a 
predefined manner (intra-interviewer vari- 
ance). A number of methodological questions 
were discussed and brief descriptions of re- 
search now under way were given. 
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Changes in Some Moral Values Following 
Psychotherapy 


David Rosenthal 


One problem facing the researcher in psy- 
chotherapy is that of accounting for the suc- 
cesses achieved by therapists who differ 
widely in both their theories of neurosis and 
its treatment. Eysenck (6) has questioned 
the validity of all these reported successes. 
Others, less skeptical, have considered the 
question of what these therapists do or have 
in common, reasoning that this must be the 
salient therapeutic force. Fiedler (7) in an 
investigation of three orientations to psycho- 
therapy, the Freudian, Adlerian, and Rog- 
erian, concluded that it was the quality of the 
relationship which was common to successful 
therapists of all three schools. 

However, the question is still open with re- 
spect to what transpires, as a result of this 
relationship being offered, that makes for a 
successful outcome. Therapists generally offer 
the explanation that the relationship permits 
the unlearning of some responses and the 
learning of others to occur, but they do not 
fully agree as to what constitutes the critical 
learning. Rogers (11) is coming to believe 
that it is a learning to accept fully and with- 
out fear the “positive feelings of another.” 
Mowrer (10) feels that the patient has to 
unlearn problem-solving behavior which has 
been used to prevent certain sign-learning 
from occurring. Shoben (13) sees therapy as 
remedying defects in social learning, the pa- 
tient learning to react without anxiety to hu- 
man interactions which involve impulses that 
have become associated with punishment. 
Dollard and Miller (4) emphasize the fact 
that the relationship permits previously in- 
hibited fear responses to occur so that they 
may be extinguished. 
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Mowrer (9) counterposes his own position 
to that of the psychoanalysts, which is essen- 
tially that of Dollard and Miller. He does not 
feel that the neurotic’s difficulties arise be- 
cause of overrestraint and unrealistically high 
moral standards, and he does not believe that 
superego functions or values can be influ- 
enced by psychotherapy. The study reported 
here bears on this issue. In the main, we un- 
dertook to test the following hypotheses: that 
values do change in psychotherapy, particu- 
larly moral values since these are thought to 
be at the center of neurotic conflicts; that 
the patient learns to accept the moral values 
of the therapist, and that alleviation of 
psychological distress accompanies this new 
learning. It was hypothesized too that values 
such as those found in the Allport-Vernon 
Scale of Values would not be systematically 
influenced by psychotherapy, since these 
would not ordinarily be at issue either in the 
neurosis or its treatment. 


Procedure 


Patients at the Henry Phipps Psychiatric 
Clinic were accepted for the study only when 
they and their doctors agreed to take all 
the necessary tests. No patients diagnosed as 
psychotic were included. Some high school 
education was also made a requirement for 
inclusion since many of the items on the 
tests could not be understood by people of 
low education or intelligence. 

Many patients who were considered eligible 
never completed the experiment. Some were 
too upset to take the tests or had had shock 
therapy. Others took the tests initially but 
left against advice shortly thereafter, either 
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before arrangements could be made for retest- 
ing, or else with a flat refusal to be retested. 
As it turned out, it took two years to accumu- 
late twelve patients who were tested before 
and after therapy. Of these, nine were hos- 
pitalized or “house patients,” while the re- 
maining three were treated on an outpatient 
basis. Six were male, six female. Diagnoses 
included psychoneurotic, personality, psycho- 
physiologic, and adjustment disorders. Ages 
ranged from 18 to 46, with a mean of 29.5 
years. Length of treatment ranged from 3 
weeks to one year, with a mean of 5 months. 

At some point early in therapy, after fewer 
than six therapeutic interviews, the patients 
were given four tests: Frank’s Symptom-Dis- 
ability Check List; the Allport-Vernon-Lind- 
zey Scale of Values (1); the Butler-Haigh (2) 
Self Concept items, from which was derived 
a modification of the Dymond Adjustment 
Scale (5); and a Moral Values Q sample de- 
vised by the author. These tests were repeated 
at the end of treatment. The therapists, all 
psychiatric residents at the Henry Phipps 
Psychiatric Clinic, were given the Allport- 
Vernon-Lindzey Scale of Values and the 
Moral Values Q sample. These tests were not 
readministered to the therapists at the end 
of treatment since it was not felt that they 
would show any predictable change. It could 
have been hypothesized that the therapist 
also assimilated some values of the patient, 
but when it is realized that a therapist treats 
many patients at the same time, each hold- 
ing a different set of values, then it seems 
doubtful that he can assimilate the values of 
all in any readily measurable way. At any 
rate, it was assumed in this study that the 
therapist’s values were stable. 

At the end of treatment, each patient was 
interviewed by the author who had no knowl- 
edge of how the final and initial test results 
compared. The interview was open-ended, 
the patient being given an opportunity to air 
all thoughts and feelings bearing on his thera- 
peutic experience. Almost all welcomed the 
opportunity to tell how they felt. During the 
interview, the author directed the discussion 
to the following questions: How did the pa- 
tient feel as compared to when he began 
therapy? If he felt better, how was he better? 
If worse, how worse? If he felt better, what 


did he think helped him to feel better? How 
did he feel about his therapist? What did he 
feel about therapy, terminating therapy, the 
future? No mechanical recording was made 
of the interview since it might have reduced 
rapport and the willingness to speak freely. 
Immediately after the interview, a detailed 
report of it was made by the author. Bias 
could not have entered into the reporting 
without prior knowledge of the test results. 
At the end of the experiment, all twelve in- 
terviews were rated independently by three 
judges on a 7-point scale as to whether the 
patients benefited or not by their therapeutic 
experience. 


The Allport-Vernon-Lindzey Scale of Values was 
administered in the usual way. The Symptom-Dis- 
ability Check List was developed by Dr. Jerome D. 
Frank and his colleagues in the course of research 
on the response of patients to individual and group 
therapy. It has not yet been published. It consists of 
41 symptoms such as “headaches,” “feeling blue,” 
“unusual fears,” etc., which are rated by patients on 
a 4-point scale of distress, and 12 additional items 
rated on a 6-point scale of severity. 

The Butler-Haigh set of 100 statements, intended 
by its authors as a Q sample, was administered to 
our patients by asking them to sort the statements 
into two piles: 1. Do mot describe you well; are not 
so true of you; are more unlike you than like you; 
do not apply to you so well. 2. Describe you fairly 
well; are true of you for the most part; are like 
you more than unlike you; apply to you more than 
not. Since we wished by this procedure to obtain a 
score based on the Dymond Adjustment Scale, this 
dichotomous sorting was quite adequate and it was 
a simple task for the patients. 

The Moral Values Q sample consists of 60 state- 
ments based on three main areas of behavior around 
which psychological conflicts commonly arise: sex, 
aggression, and authority. Each area in turn is sub- 
divided so that two generally contrasting sets of atti- 
tudes are represented with respect to each area. 
Thus, there are six categories: 


1. Sex-rigid: sex is seen as evil, perhaps a neces- 
sary evil, dirty, to be shunned, postponed, or sup- 
pressed. 

2. Sex-free: sex is seen as good, healthy, natural, 
to be accepted, enjoyed, understood. 

3. Antiaggressive: acts of aggression, violence, or 
pain infliction are thought of as wicked, to be 
avoided, suppressed, suffered. 

4. Aggressive: aggression is thought of as neces- 
sary, rewarding, to be accepted, expressed. 

5. Disciplinarian: authority, discipline, order, and 
regulation are seen as absolute, necessary, desirable, 
to be accepted, supported. 


te 
e 
1 
1 
: y 
y 
€ 


Moral Values and Psychotherapy 433 


6. Libertarian: authority is seen as a relative mat- 
ter, with individual responsibility and reason, pri- 
mary. 


Moral values were thought to involve be- 
havior which is seen as right or wrong, good 
or bad, permissible or not permissible, worthy 
of reward or punishment. Items were sought 
which would reflect such attitudes. Ten items 
selected from therapy protocols and from 
other sources were written for each category. 
Controversial issues which aroused strong 
feelings were given priority since it was felt 
that conflicts would more generally arise 
around such issues. The statements are as 
follows: 

Sex-Rigid 

3. It is a wife’s duty to have intercourse with her 
husband. 

10. Girls should come to the marriage bed as virgins. 

15. It is bad taste to tell dirty jokes in mixed com- 
pany. 

22. Men who have sex relations outside of marriage 
are low and contemptible. 

27. Divorce is never justified. 

33. People should not marry before they are twenty- 
one. 

39. Sexual intercourse should be had only for the 
purpose of having children. 

46. The best and truest love is one without sexual 
thoughts and feelings. 

52. Pulp magazines featuring sexy love stories should 
be abolished. 

57. Masturbation is a sin. 


Sex-Free 


4. It’s perfectly all right for teen-agers to kiss and 
neck. 

9. Men ought to have some sexual experience be- 
fore marriage. 

16. Sexual intercourse provides one of life’s most 
beautiful experiences. 

21. A woman may be justified in having another 
man if her husband is not satisfying her sexual 
needs. 

28. Prostitution should be legal. 

34. Unnatural sex acts should not be considered a 
crime. 

40. Sex-play before intercourse makes it more satis- 
factory. 

45. Sex education should be taught in public schools. 

51. One should expect one’s wife or husband to be 
attracted to handsome members of the opposite 
sex. 

58. If young children play with their sex organs, 
parents need not be worried or concerned. 


Antiaggressive 


1. If someone strikes you on one cheek, turn the 
other. 


12. 
13. 
24. 


25. 
31, 
37. 


Using live animals for medical research is wrong. 
Judge not lest ye be judged. 

One should always try one’s best to get along 
with others. 

It is wrong to hate. 

Capital punishment is wrong. 

If you’re angry, count ten and try to let the 
feeling go away. 


. Wars are never justified. 
. The meek shall inherit the earth. 
. Violence breeds violence. 


Aggressive 


. A man who lives by the sword should die by 


the sword. 


. Might makes right. 
. One should say what one thinks about other 


people. 


. If people annoy you, either let them know it or 


don’t bother with them. 


. Some people are no damn good. 
. Let the punishment fit the crime: an eye for an 


eye. 


. To the victor belong the spoils. 
. A nation that will not defend its rights or its 


honor deserves not to survive. 


. The race is to the swift and the strong. 
. You have to use brute force to set some matters 


straight. 
Disciplinarian 


. Labor unions should be strictly regulated. 
. One should always be busy and keep one’s mind 


occupied. 


. Children should do as they are told. 

. Parents act in the best interest of their children. 
. Parents sacrifice a lot for their children. 

. “Thou shalt honor thy father and mother” is a 


commandment which should always be observed. 


. Everyone should belong to some church. 
. What our country needs most are great leaders. 
. A soldier should always obey orders first, ask 


questions afterward. 


. Our future lies in the hands of fate. 


Libertarian 


. There may be times when one should break the 


law. 


. People should take time out to relax and think 


about themselves. 


. Parents should always tell children why things 


should or should not be done. 


. Many parents do not really love their children. 
. The trials of parenthood are more than equalled 


by its rewards. 


. Parents often deserve to lose the respect of their 


children. 


. God is created by man in his own image. 
. Famous leaders owe their greatness to the ones 


they lead. 


. An employee should express his opinion, even 


when the boss disagrees. 


. A person must take responsibility for his own 


life. 
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Table 1 


Rho Correlations of Criteria of Improvement 


Signifi- 
af cance 
Criteria Rho (N—2) levelst 
Interviews and SDCL 61 10 .038 
Interviews and Adjustment 
Score 85 9* 001 


SDCL and Adjustment Score A8 9* .144 


* Patient D, the first in our series, had his initial testing 
before we had decided to use the Dymond Adjustment Scale, 
os so no measure of change on this criterion was available for 

im. 

t Significance levels were determined according to procedures 

suggested by Kendall (8). 


Patients and therapists were required to 
sort the 60 statements into eleven piles 
based on the strength of their belief or dis- 
belief regarding each statement. The number 
of statements to be placed in each pile was 
as follows: 1, 2, 4, 7, 10, 12, 10, 7, 4, 2, 1. 
The rationale underlying these procedures has 
been discussed at length by Stephenson (15). 


Results 


The main hypothesis of the study was that 
patients who improved would tend to modify 
their system of moral values in the direction 
of their therapists’ moral values. A review of 
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the literature makes it apparent that the prob- 
lem of criteria of improvement is an unsolved 
one. We used three possible criteria: change 
on Frank’s Symptom-Disability Check List 
(SDCL); change on Dymond’s Adjustment 
Scale, based on the patient’s sortings of the 
Butler-Haigh statements; and judges’ ratings 
based on the post-therapy interviews. The lat- 
ter are admittedly crude, but they have the 
advantage that in the interview the patient 
could speak freely, and that where he was 
motivated to create an impression of im- 
provement or lack of it, this could be re- 
flected in what he said and taken into ac- 
count by the judges. 

Patients were ranked on all three criteria, 
the rankings intercorrelated to determine the 
comparability and stability of the measures, 
as shown in Table 1. 

From Table 1 it can be seen that the 
judges’ ratings based on the interviews cor- 
related significantly with ratings based on 
both the Symptom-Disability Check List and 
the Adjustment Scale, but that the latter did 
not correlate significantly with each other. It 
thus appeared that the judges’ ratings were 
the best single index of improvement. 

To determine the direction of change in pa- 
tients’ moral values, each patient’s initial and 


Table 2 


Judges’ Improvement Ratings and Moral Values Correlations 


Correlation Correlation of patients’ sortings with 
of patients’ therapists’ sortings 
Mean of Rank of initial 
judges’ judges’ and final Initial Final Rank 
Patient ratings* ratings sortings 1 2 Z2-Z, 22-21 
A 5.3 1.5 A7 .108 371 .281 1 
B 5.3 1.5 72 — .003 .200 .206 3 
Cc 4.8 3 86 .664 .653 —.019 7 
D 4.7 4.5 A8 443 535 122 5 
E 4.7 4.5 84 398 402 .005 6 
F 4.2 6 .70 173 399 .248 2 
Sum .843 
G 4.0 7 .67 .192 015 —.179 11 
H 3.0 8 58 345 .226 —.130 9 
I 2.8 9 54 451 406 — .055 8 
J 2.3 10 51 .245 398 172 4 
K 2.2 11 59 .268 A31 —.143 10 
L 1.7 12 62 .200 —.011 —.214 12 
Sum —.549 


* Ratings of Improvement: much improved = 6; moderately improved = 5; slightly improved = 4; unimproved = 3; slightly 


worse = 2; moderately worse = 1; much worse = 0. 
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Moral Values and Psychotherapy 


final sortings of the moral value items were 
correlated with the therapist’s sorting. Table 
2 shows the main data bearing on the hy- 
pothesis. 

The rho correlation of judges’ ratings of 
improvement and change in direction of 
therapist’s moral values equals .68, which, for 
df = 10, has a significance level of .016. If we 
compare the changes in moral values of those 
above the median improvement rating with 
those below the median, we find ¢= 3.013 
which is also significant at the 2 per cent 
level. Thus, the main hypothesis appears to 
be supported by the data. 

However, the changes in the moral values, 
when considered in relation to the therapist’s 
values, are not very large. This could mean 
that although the system of values is modi- 
fied, it is not profoundly altered. An alterna- 
tive explanation is that greater changes could 
not be expected since only two of the patients 
were rated as more than moderately im- 
proved. In one of these, Patient A, the 
changes in moral values were quite large. 

It appears too that the hypothesis needs to 
be expanded to account for the fact that pa- 
tients who were unimproved or worse tended 
to move away from the therapist’s value sys- 
tem, so that changes in moral values may be 
effected in both directions by the therapeutic 
situation. 

The occurrence of exceptions to the hypothesis, 
Patients C, E, G, and J, indicated that other factors 
may be involved as well as changes in moral values 
to account for improvement or lack of it. Patient C 
thought that what helped him as much as anything 
was the discovery that he was smarter than his 
therapist. According to patient E, who had been ex- 
perimenting with paranoid ideas and wanted to 
“sign out” of the hospital, he was told by his 
therapist that if he signed out he would have to be 
committed to a state hospital since as long as he 
believed his paranoid ideas he was insane. This was 
a “jolt” to him. Afterward, he began to question his 
“beliefs,” found out what purposes they served him, 
and finally gave them up. Thus, therapy in these 
cases took an unusual course where conflicts around 
moral values were not at issue as far as the pa- 
tients were concerned. Patient G “signed out” against 
advice. Her husband, in service, was being trans- 
ferred to another part of the country and she wanted 
to go with him, fearing to be left behind. She was 
thus bent on convincing everyone of her “improve- 
ment” so that they would let her go. When asked 
what she thought had helped her, she said that it 
was being around the other patients, especially those 


on the first floor (psychotics) and learning from 
them how she should be or should not be. She was 
discharged as unimproved by her therapist, rated as 
slightly improved by all three judges, and made her- 
self out to be considerably improved on the Symp- 
tom-Disability Check List and the Adjustment Scale. 
Patient J spontaneously remarked that therapy had 
taken away her former values and had not replaced 
them with anything. She felt “confused” at the time 
of her discharge but was planning to continue ther- 
apy on an outpatient basis. Thus, the effect found 
here might be one of therapy-in-transition. 


To test whether similar changes would oc- 
cur with respect to values of the type meas- 
ured by the Allport-Vernon-Lindzey Scale of 
Values, the score obtained by the patient in 
each of the six scale categories, both before 
and after therapy, was compared with his 
therapist’s scores in the same categories by 
means of the D technique (3). Four pa- 
tients’ profiles on this scale became more 
similar to their therapists’, eight less similar. 
Change in similarity to the doctor’s profile 
showed a rho correlation of .18 with amount 
of improvement. Since this is not significantly 
greater than zero, we cannot assume any sup- 
port for the hypothesis of a relationship be- 
tween changes in such values and improve- 
ment. This accords with our expectations, 
since it was believed that such values were 
not likely to be importantly involved either 
in neurosis or therapy. 


Summary and Conclusion 


Twelve patients at the Henry Phipps Psy- 
chiatric Clinic were given a battery of four 
tests before and after psychotherapy. Their 
therapists were given two of these tests. Pa- 
tients who improved tended to revise certain 
of their moral values in the direction of their 
therapists’, while the moral values of patients 
who were unimproved tended to become less 
like their therapists’. This was not found in 
the case of values such as those of the All- 
port-Vernon-Lindzey scale. 

It was thought that the latter type of values 
showed no systematic change with respect to 
improvement since it was not likely to be 
at issue in the neurosis itself or in its treat- 
ment. Changes in moral values centering 
around sex, aggression, and authority were 
thought to occur because these issues are 
commonly involved in patients’ conflicts. 


Bi 
435 

ving 

1 
T 
| 

4 

4. 

{ 
‘ 
| 

|. 
% 
‘ESE 

wan 
AR 


If patients who improve change these moral 
values to accord more with their therapists’ 
moral values, it seems likely that the patients 
. have learned to accept part of the latter’s 
value system as their own. Most therapists, 
however, seem to take great precautions to 
avoid influencing their patients’ values in any 
way at all. Shoben (14) emphasizes that the 
patient must evolve his own set of moral 
values and that the therapist must not en- 
graft his own values upon him. It may be 
that the therapist communicates his values to 
the patient in many unintended, subtle ways, 
even when trying to avoid so doing. The pa- 
tient, who is often sensitized to the therapist’s 
every word and inflection, may be able to re- 
ceive these communications, and because of 
his trust, admiration, and respect for the 
therapist, may permit himself to be influenced 
by them. 

It seems likely, too, that there is a direct 
connection between one’s self concept and 
one’s system of moral values. These values 
are an important part of the standards by 
which the self is judged, yet not only the self 
but others as well. Thus, Sheerer’s (12) find- 
ing that there is a direct relationship between 
change in acceptance of self and change in 
acceptance of others is predictable when it is 
considered that both self and others are be- 
ing judged in the light of a changed set of 
values. 

The findings would not seem to support 
Mowrer’s notion that “superego functions or 
values can be influenced very little, if at all, 
by psychotherapy.” He places emphasis in- 
stead upon “reestablishing normal communi- 
cation between ego and superego and upon 
changing ego functions rather than superego 
| values or standards” (9, p. 92). It may be 
: that the changes found in this study are pri- 

marily evidences of better communication be- 

tween ego and superego, but perhaps it is 

necessary for us to be able to differentiate 

operationally these various structural con- 

cepts before we can render any verdict about 
processes affecting them. 
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Projection, Self Evaluation, and Clinical Evaluation 
of Aggression’ 


Anthony Davids,’ Andrew F. Henry,’ Charles C. McArthur, 
and Leo F. McNamara * 


Harvard University 


What relation do psychologists expect be- 
tween the way a need is expressed in projec- 
tive tests and the way it is expressed in be- 
havior? According to an hypothesis of “direct 
projection,” the two expressions of the same 
need should be parallel; while according to 
an “hydraulic theory,” the amounts of expres- 
sion in fantasy and behavior should be in- 
verse. One might also make different predic- 
tions based upon more complex relations that 
have been proposed to exist between fantasy 
expression and behavioral expression of needs. 
Whatever theoretical position one chooses, 
empirical examination of hypotheses is facili- 
tated by assessment of some need that fre- 
quently gains expression both in fantasy and 
behavior. One of the most common, as well 
as one of the most important, both theoreti- 
cally and practically, is “n Aggression.” 

Within n Aggression, it is customary to 
differentiate between extrapunitive aggression 
and intropunitive aggression. In his original 
work on the Thematic Apperception Test 
(TAT), Murray (6) pointed to the clinical 
importance of the subcategory of aggression 
termed “n Intraggression.” Rosenzweig, also, 
has demonstrated the theoretical and em- 
pirical utility of this dichotomy between in- 

1 This study is an outgrowth of a program of re- 
search conducted at the Harvard Psychological Clinic 
under the direction of Professor Henry A. Murray 
and supported by a grant from the Rockefeller 
Foundation. The authors are indebted to the follow- 
ing institutions for facilitating this report: the Na- 
tional Institute of Mental Health (Grant M-700), 
Public Health Service, the Laboratory of Social Re- 
lations, and the Study of Adult Development at Har- 
vard University. 

2 Now at Brown University and Emma Pendleton 
Bradley Home. 

8 Now at Vanderbilt University. 

* Now at the Queen’s University of Belfast. 


ward- and outward-directed aggression (7, 
8). Moreover, recent research (5) has shown 
cardiovascular reactions to be related to 
“Anger In” and to “Anger Out” emotional 
reactions in response to stress. And these 
concepts have proved useful in sociological 
studies of suicide and homicide (4). 

The present experiment represents an at- 
tempt to further explore relations among 
various measures of n Aggression. Specifically, 
we will relate measures of aggression derived 
from self ratings, clinical ratings, and projec- 
tive protocols. 


Method 


Subjects. The subjects (Ss) were 20 male 
college students who volunteered to serve as 
paid Ss in an intensive investigation of per- 
sonality. They participated in approximately 
40 hours of psychological examination, con- 
sisting of psychodiagnostics, experiments, and 
interviews, conducted by a team of investi- 
gators. 

Projective ratings. The TAT, administered 
individually to each S, was electrically re- 
corded and transcribed verbatim. An experi- 
enced clinical psychologist, who had not 
administered the test and who was not per- 
sonally familiar with the Ss, analyzed the 
stories given in response to cards 3, 6, 7, 8, 
and 13 of the male series. On the basis of his 
analysis, he rated the Ss in regard to both 
amount and direction of aggression. That is, 
upon analyzing each S’s TAT stories, the psy- 
chologist rated him as either high or low on 
aggression as well as rating him as either 
characterized by anger directed inward or 
outward. 

Self evaluation. Included in a comprehen- 
sive inventory of self ratings was a scale de- 
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Table 1 
TAT Rating and Self Rating on Amount of Aggression 


TAT 
Self 
rating High Low 
High 6 5 
Low 6 3 
Fisher's p = .47. 


signed to measure conscious avowal of ag- 
gression. The S was provided with a 6-point 
rating scale running from “much less than 
most people” to “much more than most peo- 
ple,” and instructed to indicate his general 
level of aggression. For purposes of the pres- 
ent investigation, Ss who rated themselves as 
more aggressive than most people were classi- 
fied as high on aggression and those who rated 
themselves as less aggressive than most peo- 
ple were classified as low on aggression. 
Clinical evaluation. An experienced clinical 
psychologist, who had contact with the Ss 
over approximately an 18-month period, 
rated each S in regard to both amount and 
direction of aggression. That is, utilizing in- 
formation gleaned from his personal contacts 
and from his impressions derived from the 
Ss’ responses to a battery of tests and experi- 
ments, the psychologist rated each S as either 
high or low on aggression and classified him 
as either extrapunitive or intropunitive. It 
should be emphasized that the TAT was not 
in the battery of tests analyzed by this psy- 
chologist and that he was not the same per- 
son who made the projective ratings. 
Statistical treatment of data. Association 
between the various sets of ratings is pre- 
sented in the form of 2 X 2 contingency 
tables. Because of the low expected frequen- 
cies in the cells of these tables, Fisher’s (3) 


Table 2 


TAT Rating and Clinical Rating on 
Amount of Aggression 


Table 3 


TAT Rating on Direction of Aggression and Self 
Rating on Amount of Aggression 


TAT 
Self 
rating Out In 
High 10 1 
Low 1 8 


Fisher's p = .001. 


exact method for determining significance of 
association is employed rather than the more 
conventional chi-square technique. Thus, the 
statistic utilized to report significance of find- 
ings is Fisher’s p. 


Results 


From the results presented in Table 1, it 
appears that the relation between TAT rat- 
ings and self ratings on amount of aggression 
is essentially random. This finding might be 
regarded as negative evidence for the validity 
of the TAT. Additional evidence suggesting 
lack of TAT validity might be drawn from 
the fact that the relation between TAT rat- 
ings and clinical ratings on amount of ag- 
gression, shown in Table 2, is also random. 
Upon further analysis, however, it appears 
that the seeming lack of validity of the TAT 
may be due to an artifact. 

The reason is suggested by the findings 
shown in Table 3. Here it becomes evident 
that there is an almost perfect association, 
significant beyond the .001 level, between the 
amount of aggression avowed in self ratings 
and the direction of aggression as assessed in 
thematic apperceptions. Individuals with pro- 
nounced n Intraggression, according to an ex- 
perienced psychologist’s interpretation of the 
TAT’s, rated themselves as low on n Ag- 


Table 4 


Clinical Rating on Direction of Aggression and 
Amount of Aggression 


TAT Clinical rating 
Clinical Clinical 
rating High Low rating Out In 
High 5 + High 8 1 
Low 7 4 Low 3 8 
Fisher's p = .54. Fisher's p = .01. 
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Table 5 


TAT Rating on Direction of Aggression and 
Amount of Aggression 


TAT 
TAT Out In 
High 5 7 
Low 5 3 


Fisher's = .33. 


gression. This finding suggests that the men 
evaluated themselves under the assumption 
that only outward-directed aggression was to 
be considered in estimating the amount of 
one’s Own aggressiveness. 

In this regard, it is noteworthy, as evi- 
denced by Table 4, that the clinician also 
tended (p = .01) to equate a high rating on 
amount of aggression with a rating of “Anger 
Out.” These statistically significant findings 
are reinforced by the observation that out of 
the entire group of Ss only one individual was 
rated high on aggression by both himself and 
the clinician and was also rated clinically as 
directing his aggression inward. The clinician 
was fully aware that need aggression can gain 
expression in two directions. Yet a major dif- 
ference between the clinical ratings and the 
TAT ratings was that for the clinician high 
n Aggression was associated with “Anger Out” 
while for the TAT, as shown in Table 5, high 
and low n Aggression occurred independently 
of the rating on “Anger Out” or “Anger In.” 
As a result, of course, the projective ratings 
bear a random relation to the clinical ratings. 


Discussion 


On the basis of these findings, one might 
conclude that the TAT measure of inward- 
directed aggression is invalid. The alternative 
conclusion, however, is that the TAT is mak- 
ing a unique contribution to the case record. 
It seems to us that the latter interpretation is 
more plausible. In American culture there are 
many legitimate means for expressing “Anger 
Out,” even in quite “proper” situations. For 
example, TAT stories of murder and assault 
reflect, in an extreme degree, a need that in 
everyday life may find less drastic expression 
in such things as changing an appointment, 
being harshly critical of the work of others, 


or engaging in heated discussion of politics. 
Often the TAT measure of n Extraggression 
will do no more than confirm the presence and 
extent of influence of this need that is dis- 
played in many facets of the case record. In 
fact, the person himself may be well aware 
of -his aggressive feelings toward people and 
situations in his environment, and, upon di- 
rect questioning, he may be both able and 
willing to present an accurate account of this 
aspect of his personality make-up. 

By contrast, however, n Intraggression does 
not have many outlets for expression in every- 
day behavior. In fantasy, one’s heroes may 
be victims of fate or may commit suicide, but 
in ordinary life one has little recourse be- 
yond an occasional “accident” or a self-criti- 
cal remark. Moreover, mild degrees of in- 
traggressive behavior are not defined as such 
by society at large. Therefore, it seems likely 
that the intraggressive need will not be writ- 
ten bold over all the case record. Here is 
where an “hydraulic” function is served by 
the TAT— it presents an opportunity to ex- 
press in fantasy a need that rarely gains be- 
havioral expression. 

It seems that we are left with a slight 
variation of an old rule. This version is, “For 
needs with behavioral channels to expression, 
the TAT and behavior are parallel; while for 
needs with few behavioral channels, measures 
derived from the TAT and from behavior are 
inverse.” However, this formulation requires 
further consideration. Actually, both aspects 
of this formulation can be viewed as special 
cases of the general rule that “TAT projec- 
tions are directly parallel to other behavior.” 
The narrator whose TAT heroes feel n In- 
traggression possesses such a need himself in 
real life and it influences his actions, but so- 
ciety prescribes that he can act out this par- 
ticular need with less visible behavior. One 
result of this is that many men who have in 
their TAT protocols the theme “Aggression 
expressed outward followed by aggression di- 
rected inward” were rated predominantly as 
“Anger Out” and “High” in n Aggression both 
by the psychologist and by themselves. It 
seems that only the first part of the theme 
was readily translated into visible behavior. 

Occasionally, however, a shrewd observer 
managed to deduce the complete theme on 
the basis of subtle behavioral signs. For ex- 


al 


* 
4 be 
ke 
a 


ample, one person, whose TAT heroes all 
evidenced a strong need to direct their ag- 
gression outward but were subsequently “re- 
luctant” to express this feeling, was rated 
correctly by an observer who said that the 
subject was “so suspiciously nice I had the 
feeling there was something going on there.” 

Thus it seems plausible that the relation be- 
tween projected needs and behavioral needs is 
direct parallelism. Yet a projective test may 
add something unique to a case record be- 
cause certain needs find more condoned ex- 
pression in samples of fantasy than in sam- 
ples of any other kind of behavior. When it is 
equally easy to act on a given need both di- 
rectly and in projective situations, however, 
it seems likely that knowledge of the need 
will not be greatly increased by adding a 
projective test to one’s battery of assessment 
techniques. For such needs one may, at least 
when examining healthy people who have no 
motivation for distorting the data, do quite 
as well by eliciting an S’s direct report as by 
utilizing a complex projective method of per- 
sonality assessment. This point has recently 
been stated rather convincingly by Allport 
(1) and has received empirical confirmation 
in an experiment reported by Davids (2). 
This notion is further substantiated by the 
present finding shown in Table 6, of signifi- 
cant association (p= .01) between the Ss’ 
self evaluations of whether they were high or 
low on aggression and the clinician’s evalua- 
tion, based on many hours of assessment, of 
their standing on this attribute. 

Although he believes that direct methods 
are as effective as projective methods when 
applied to normal Ss, Allport (1) states that 
the TAT and other projective techniques 
often make an important contribution in the 
assessment of neurotics who are apt to cur- 
tail their self-expressions defensively. We 


Table 6 
Self Rating and Clinical Rating on Amount 
of Aggression 
Self rating 
Clinical 
rating High Low 
High 8 1 
Low 3 8 


Fisher's » = .01. 
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would extend his statement and say that, 
since the culture curtails the expression of 
certain needs in almost everyone, the TAT 
will contribute to increased understanding of 
these needs in normals as well as in neu- 
rotics. In fact, since the behavioral expression 
of such needs may be even more inhibited in 
emotionally healthy people, who are more 
likely to be “culturally normal,” the relative 
contribution made by projective methods may 
be even greater when these needs are assessed 
in normals than when they are assessed in 
neurotics. 


Conclusions 


In the present study, both the self and the 
clinician, we have inferred, were measuring 
and evaluating n Extraggression—a need that 
has many culturally sanctioned channels of 
expression. Under these conditions the pro- 
jective test behavior and the rest of the re- 
corded behavior are concordant, and, in this 
sense, nothing unique is added by the projec- 
tive method. However, it seems to us, the 
unique advantage and contribution of projec- 
tive techniques becomes evident when one ex- 
amines n Intraggression—a need whose ex- 
pression in behavior other than fantasy is 
culturally made difficult. 


Received March 18, 1955. 
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Repression and the “Return of the Repressed” 


John H. Flavell 


Clark University 


Zeller (4, 5) recently conducted an experi- 
ment unique among laboratory studies of 
repression. Not content merely to demonstrate 
again that something akin to repression of 
threat-associated material may occur in a 
memory experiment, Zeller further showed 
that a reminiscence or reinstatement of this 
forgotten material can be seen when the threat 
is later disassociated from it. The threat used 
in his study was that of an induced feeling 
of failure in the performance of a motor task. 
If Zeller has in fact created an experimental 
analogue of Freud’s “return of the repressed” 
(1), his study deserves to be considered an- 
other “first” among experimental investiga- 
tions of psychoanalytic phenomena. 

It is the purpose of the present study to 
attempt verification of Zeller’s findings. The 
present design incorporates two additional re- 
finements, however. First, in order to approxi- 
mate more closely than Zeller has done the 
clinical conditions under which repression and 
return of the repressed would ordinarily be 
expected to occur, a threat to personal adjust- 
ment was substituted for Zeller’s threat of 
failure on a motor task. Second, an attempt 
was made to discover whether repression and 
return of the repressed are selective or gen- 
eral in their effects, i.e., whether or not the 
repressive influence of threat is mainly con- 
fined to that memory material most closely 
associated with the threat. In view of Korner’s 
(2) experimental findings, the expectation 
would be that threat to personal adjustment 
would be selective rather than general in its 
effects. Therefore, the study proposes the 


1 This study formed the basis for a Master’s thesis 
submitted to Clark University in 1952. It was carried 
out under the direction of Dr. Thelma G. Alper, for 
whose invaluable assistance the author wishes to 
express his gratitude. 

2 Now at the University of Rochester. 
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following three hypotheses: (a) a threat to 
personal adjustment will produce a significant 
decrement in the recall of verbal material 
associated with this threat; (6) a partial 
reminiscence of this forgotten material will 
be in evidence after threat conditions are re- 
moved; (c) forgetting and reminiscence in- 
duced by threat and threat-removal will tend 
to be specific to that verbal material most 
closely associated with threat. 


Method 
Subjects 


Thirty-eight college freshmen and sopho- 
mores served as volunteer Ss. Prior to the 
experiment proper these Ss were divided into 
two equal-sized groups, roughly equated for 
nonsense syllable recall. The Ss were un- 
selected for sex, age, or intelligence. 


Procedure 


All Ss were seen individually for four con- 
secutive daily sessions. 


Session I. The Ss of both groups were shown a 
series of twelve nonsense syllables, exposed for two 
seconds each, under instructions to memorize for 
future recall. Two more exposures of the twelve 
syllables followed. The Ss were then instructed to 
write a one-word association to each syllable as it 
was presented a fourth time. Recall Test 1 followed. 
In this and the four subsequent Recall Tests the Ss 
were given four minutes in which to write down 
as many of the twelve syllables as they could re- 
member. 

Session II. For the experimental group Ss, Session 
II proceeded as follows. When the S entered the 
room he was introduced to a psychology professor ® 
and told that the latter had kindly consented to 
“interpret” the S’s one-word associations of the 
previous day as “normal” or “abnormal” with re- 
spect to “those of the general college student popula- 
tion.” The EZ then read aloud the twelve nonsense 
syllables and the associations the S had previously 


8 Dr. Thelma G. Alper. 
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given to each. The professor commented “abnormal” 
or “normal” after each syllable-association pair in a 
prearranged sequence which was identical for all Ss. 
In this manner it was insured that six of the non- 
sense syllables were always indirectly associated with 
a comment of “normal” and the remaining six with 
“abnormal.” The procedure was then repeated “to 
make sure the interpretations were correctly re- 
corded.” Following this, the professor left the room 
and Recall Test 2 ensued, in which the S again 
wrote down all the nonsense syllables he could 
remember. The control group Ss were likewise given 
two verbal presentations of the syllable-association 
pairs followed by Recall Test 2. However, the pro- 
fessor was not present and no interpretations of as- 
sociations were made. Rather, the two presentations 
were given under the pretext first, of making sure 
that the associations were correctly recorded and, in 
the case of the second presentation, to ascertain 
whether the Ss were satisfied with these particular 
associations. Thus, for the experimental group, Ses- 
sion II consisted of two “incidental learning” trials, 
personality threat indirectly associated with six of 
the twelve nonsense syllables, and Recall Test 2. 
For the control group, the procedure was essentially 
identical except that no threat was introduced. 

Session III. For the experimental Ss, Session III 
commenced with Recall Test 3. The psychology pro- 
fessor then entered and explained to the S in some 
detail that her interpretations of the day before were 
completely bogus, that the whole affair had been 
staged for experimental purposes to test the effect 
of stress on performance, etc. In order both to 
assess the effectiveness of the stress and to provide 
some catharsis for the Ss, each S was asked to com- 
ment on his reaction to the interpretations. About 
half of the Ss frankly stated that the interpretations 
had been very disturbing to them. The professed 
lack of anxiety on the part of the remaining Ss was 
in most cases belied by the presence of observable 
tenseness, facial grimaces, etc., and by the obvious 
relief most Ss showed when told they had been 
victims of an experimental hoax. Recall Test 4 im- 
mediately followed this “threat removal” interview. 
The control Ss were likewise given Recall Tests 3 
and 4, separated, however, by an empty interval of 
approximately the same duration as the experi- 
mental Ss’ interviews. 

Session IV. The last session, identical for both 
groups, consisted solely of Recall Test 5. It is thus 
evident that, for the experimental Ss, the design of 
the study was essentially as follows: induction of 
threat—immediate recall—24-hour recall; and, re- 
moval of threat—immediate recall—24-hour recall. 
The control group followed the same procedure except 
that personality threat was nowhere introduced. 


esults and Discussion 


Figure 1 shows “normal,” “abnormal,” and 
“total” nonsense syllable retention curves for 
the two groups. The following within-group 
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Fig. 1. Nonsense syllable recall for experimental 
(threatened) and control (nonthreatened) groups. 


changes in mean recall scores yield statisti- 
cally significant #’s (p< .05) for a one- 
tailed test: (a) C group increase in “total” 
recall from Recall Test 1 to Recall Test 2; 
(5) E group decrease in “total” recall from 
Recall Test 1 to Recall Test 2; (c) E group 
increase in “total” recall from Recali Test 3 
to Recall Test 4; (d) C group increase in “nor- 
mal” recall from Recall Test 1 to Recall Test 
2. The fact that the E group mean “total” re- 
call dropped significantly on Recall Test 2 
whereas the corresponding C group score 
actually showed a significant increase strongly 
suggests that the hypothesized threat-induced 
repression did occur. Likewise, the significant 
increase in E group “total” recall in Recall 
Test 4 confirms the hypothesis of a remis- 
sion of repressive effects following removal of 
threat. It was therefore concluded that our 
experimental findings concur with those of 
Zeller, both with respect to repression proper 
and with respect to what has been termed “a 
return of the repressed.” 

The hypothesis that the repression process 
would be selective rather than general in its 
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effects does not appear to be confirmed by 
the data. It is apparent from Figure 1 that 
nonsignificant decrements in both “normal” 
and “abnormal” syllable recall contribute al- 
most equally to the significant “total” E 
group decrease from Recall Test 1 to Recall 
Test 2. Similarly, the increase in E group 
“abnormal” recall from Recall Test 3 to Re- 
call Test 4, although perhaps impressive look- 
ing, is not statistically significant. It is pos- 
sible that repression tends to be selective or 
general depending upon the extent to which 
threat-associated material is clearly articu- 
lated from material not associated with 
threat. In Korner’s study, the threat was un- 
equivocally associated with certain discrete 
and self-contained stories and just as un- 
equivocally not associated with other such 
stories. In the present study threat was in- 
directly associated with certain nonsense syl- 
lables imbedded in a series of similar syl- 
lables and thus differentiation may have been 
difficult. Hence, the spread of repressive ef- 
fects to material not specifically associated 
with threat in the present study may be an 
example of what Scheerer (3) has referred to 
as “generalization by default.” 


Summary 


The present study investigated the effects 
of introduction and subsequent removal of 


threat to personal adjustment on the recall of 
nonsense syllables, some of which were indi- 
rectly associated with the threat. The experi- 
mental results confirmed the hypotheses that 
threat would produce a decrement in recall 
analogous to defensive repression and that 
subsequent threat removal would cause a par- 
tial reminiscence of forgotten material akin 
to Freud’s “return of the repressed.” The hy- 
pothesis that threat would be selective rather 
than general in its effect upon recall was not 
confirmed. 


Received April 18, 1955. 
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Further Validation of the Ego-Strength Scale’ 


Robert D. Wirt? 


The Minnesota Multiphasic Personality In- 
ventory was administered to a population of 
hospitalized patients who had been treated by 
psychotherapy alone. These were scored for 
Barron’s Ego-Strength scale (1). The results 
confirmed, for a hospitalized sample, Barron’s 
findings for clinic samples: the Ego-Strength 
scale has good predictive power for response 
to psychotherapy. 

All patients admitted to the Minneapolis 
Veterans Administration Hospital during the 
period February 1, 1953, to February 1, 1954, 
routinely were administered the MMPI. Of 
these 535 cases, the 225 in whom there were 
no organic complications and who did not 
receive physiological adjuncts to treatment 
were selected for study. Upon discharge and 
without knowledge of the Es score, each pa- 
tient was rated by his psychiatrist as Unim- 
proved, Improved, or Greatly Improved. The 
chief of the neuropsychiatric service, who 
acted as supervisor for all cases treated by 
psychotherapy alone, also rated the patients 
using these categories. The 203 cases who re- 
ceived identical ratings and who were treated 


1An extended report of this study may be ob- 
tained without charge from Robert D. Wirt, De- 
partment of Psychology, University of Minnesota, 
Minneapolis 14, Minnesota, or for a fee from the 
American Documentation Institute. To obtain it from 
the latter source, order Document No. 4672 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $1.25 for microfilm or 
$1.25 for photocopies. Make checks payable to 
Chief, Photoduplication Service, Library of Congress. 
2 Now at the University of Minnesota. 
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by psychotherapy alone were selected for the 
sample. While only one item (No. 45) sig- 
nificantly separated the Greatly Improved 
from the Unimproved patients, and four items 
(Nos. 95, 109, 355, 554) tended in the di- 
rection opposite from that found by Barron, 
the scale as a whole separated the groups at 
better than the .05 level of probability. The 
mean raw score for the sample was 42.3 and 
the standard deviation was 12.46. It was pos- 
sible to construct a conversion table of T 
scores for use with hospitalized male patients. 

No patient in the Unimproved Group ex- 
ceeded the median of the patients in the 
Greatly Improved Group. If a raw score Es 
value of 50 is used as the cutting point, there 
are no false positive predictions, i.e., all pa- 
tients above this line were improved. Since 
the demand for psychotherapy far exceeds 
the supply of therapists available to do psy- 
chotherapy, this finding has considerable prac- 
tical importance. 

It is true that if on similar samples the Es 
scale were used for selection, some patients 
who would respond well to psychotherapy 
would not be chosen, but all those selected 
could accurately be predicted to show im- 
provement. 


Brief Report 
Received July 26, 1955. 
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The reverse-testing technique as proposed 
by Steinmetz (3, 4) is a procedure in which 
one person is asked to simulate the answers 
of another on a preference-type inventory. Al- 
though one use of the technique was to locate 
areas of misunderstanding or of differences in 
an individual marriage or parental counseling 
situation, some work has been reported in 
which groups utilized the cross administra- 
tion principle (1, 2). Since little or no study 
had been devoted to the generality of the 
ability to predict or to the correlates of “un- 
derstanding,” a preliminary study was made 
(5). 

Preliminary Study 


With the limitations imposed by the pro- 
cedure itself, the purpose of the previous in- 
vestigation was: (a) to determine the degree 
of generality in the ability to predict the re- 
sponses of others on a preference-type inven- 
tory, (6) to ascertain the relationship of the 
adjustment of one individual to his ability to 
predict the responses of others, and (c) to 
determine the relationship of the adjustment 
of a person to the success with which he is 
predicted. 

In the study, each of 106 students chose 
two acquaintances to predict, item by item, 
on the Bell Adjustment Inventory. The pre- 
dictions and the students’ original responses 
were treated to yield adjustment scores for 
each predicting and predicted subject and 
two sets of prediction or “understanding” 
scores. The results of the intercorrelations of 
the distributions are tabulated in Table 1. 

Pertinent partial correlations were also pre- 
sented, and the following conclusions were 
drawn: 


1 This study is condensed from a section of a doc- 
toral dissertation submitted to Stanford University 
in 1951. The author appreciates the assistance af- 
forded by L. G. Humphreys, who was then on the 
Stanford faculty. 


The Generality of the Prediction of Self Reports 
John J. Wittich’ 


DePauw University 


1. There is evidence that the ability to predict the 
responses of others may be regarded as a trait. 

2. There is some evidence that the adjustment of 
the predictor is related to his ability to predict, but 
there is also evidence of an artifactual relationship. 

3. There is some evidence that the adjustment of 
the person predicted is related to the ease with 
which he can be predicted, but there is also evidence 
of an artifactual relationship. 


As a result of the preliminary study, the 
following suggestions for a future investiga- 
tion were made: (a) the design of the experi- 
ment should not provide for the students to 
choose acquaintances to predict; (b) the de- 
sign of the experiment should be such that 
the subjects will be required to predict sev- 
eral specific people with whom they have had 
time to become acquainted; and (c) the de- 
sign should eliminate items on the instrument 
used which lack face validity. The present 
study meets these requirements. 


The Problem 


The primary objectives of the present ex- 
periment are (a) to test generality with a 
number of prediction, or understanding, tasks; 
(6) to re-examine with a new design the re- 
lation between adjustment and understand- 
ing; and (c) to investigate the relation be- 
tween adjustment and the capacity to be un- 
derstood. 


Table 1 
Intercorrelations of Five Variables 

(N = 106) 
Variables 2 3 4 5 
1. Total-Score-Student 
2. Student-for-A 48 A444 
3. Student-for-B 04 46 
4. Total-Score-A 25 


5. Total-Score-B 
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Note.—The standard error of an r of .00 is .09. 
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The Experiment 


Subjects. The subjects were 42 enlisted 
men in the United States Air Force who had 
previously exhibited college-level ability on 
standardized batteries of tests. These subjects 
were not living together in barracks in the 
usual propinquity of military life, but they 
had been working together in closely-knit 
work units daily for at least four months. 
The range in age was from 19 to 33 years. 
These subjects were assigned to groups of six 
members each (within their own work units) 
and were required to predict and to be pre- 
dicted by the other five group members. 

Instrument. A revised and shortened Bell 
Adjustment Inventory was employed in this 
experiment since it was found that many 
items in the standard inventory, particularly 
those referring to health adjustment, were 
inappropriate for servicemen. The total num- 
ber of items retained was 100. 

Another important change entailed the 
elimination of the “?” as an alternative re- 
sponse. From the preliminary experiment it 
appeared that the effect of the “?” response 
for prediction may have been to increase the 
error involved. 

It should be expected that the reliability of 
this revised instrument will be lower than that 
of the standard inventory since the revision 
has fewer items. The analysis of variance to 
be described later in this experiment will in- 
dicate that minimum reliability was obtained. 

Directions. On the first day of the experi- 
ment, the airmen were given assurance that 
the self reports would not be used against 
them. After the self reports had been obtained 
from the subjects, answer sheets for the pre- 
dicted responses were given them. Each pre- 
dictor was instructed, in each case, to attempt 
to duplicate the answers of another member 
of his work group by responding to each item 
as he thought this subject would respond. 
Subsequently, each subject was required to 
attempt to duplicate the responses of every 
other subject in his group. Each airman, 
therefore, contributed to the experiment one 
self report and five prediction reports. 

Scoring. Two types of scores are considered 
in the present experiment: 

1. The adjustment scores were obtained by 
comparing the typical scoring key of the re- 


vised inventory with the observed responses 
of the airmen on their self reports. 

2. The prediction scores were obtained by 
comparing, item by item, the predicted re- 
sponses with the self report of the predicted 
subject. 

Treatment of the scores. An analysis of the 
variance of the prediction scores was the test 
of generality, but the correlational method 
was applied to explore the relationships be- 
tween adjustment, understanding, and the ca- 
pacity to be understood. Three distributions 
were provided for statistical analysis: 


1. The adjustment scores were obtained as 
described above. 

2. The understanding distribution was ob- 
tained by averaging the prediction scores each 
airman obtained as he predicted five subjects. 

3. The capacity to be understood was es- 
tablished by averaging the five prediction 
scores which were obtained for each predicted 
subject. This provided a distribution of mean 
scores. 


Results and Discussion 


Generality of the ability to predict. Table 
2 contains the variance estimates which are 
considered appropriate for the purposes of 
the investigation. 

It will be noted that the variance between 
groups has been computed. This was done so 
that the influence of the grouping may be 
eliminated from the F ratios under consid- 
eration. The table is broken down into the 
following sources of variance: (a) the vari- 
ance between predictors, (6) the variance be- 
tween predictees, and (c) a residual or inter- 
action variance. The variance within the 


Table 2 


Variance Table for Scores Categorized in Terms of 
Predictors and Predictees 


Sums of Variance 

Sources squares df estimate 
Between Groups 2,386 6 297.66 
Between Predictors 4,683 35 133.80 
Between Predictees 8,492 35 242.63 
Residual] (Interaction) 3,680 133 29.02 
Within Predictors 12,352 168 73.52 
Within Predictees 8,543 168 50.85 

Total 19,421 209 


Predictor F ratio = 133.80/29.02 = 4.61; p < 
29.02 = 8.36; ~) <. 


Predictee F ratio = 242.63/ 


— 
: 
001. 
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scores of predictors and the variance within 
the scores of predictees are also used. 

In order to investigate the ability to pre- 
dict, the variance between predictors was em- 
ployed as the numerator and the residual or 
interaction variance was employed as the de- 
nominator. The resultant F ratio was signifi- 
cant beyond the .001 level. 

It is possible to express the magnitude of 
the generality of the ability to predict in more 
than one frame of reference. Using the vari- 
ance between predictors and the residual vari- 
ance as the basis for computation, the aver- 
age correlation between predictors was found 
to be 42. This coefficient, representing an 
intraclass correlation, may be interpreted as 
expressing the magnitude of the generality 
which might be expected if all the predictors 
predict subjects of equal predictability. In 
order to obtain generality of this magnitude, 
control of the predicted subjects would be 
necessary. 

It is also legitimate in this situation to uti- 
lize the variance within predictors in com- 
bination with variance between predictors to 
compute an average coefficient of generality. 
When this is done the coefficient is equal to 
.14. These results are of the magnitude which 
would be expected if the experimental ar- 
rangements were such that the predictors were 
all predicting different subjects. 

The level of significance of the F ratios in- 
dicates that the variance of the scores by pre- 
dictors must be regarded as the result of non- 
chance differences between predictors in the 
ability to predict. There is generality in the 
ability to predict the responses of others, 
though the amount is of little practical sig- 
nificance unless there is control of predictees. 
The low average correlation of .42 between 
predictors under the more controlled condi- 
tion indicates that any assumptions concern- 
ing the understanding ability of a subject on 
the basis of one or two predictions is subject 
to great error. 

Generality of the capacity to be predicted. 
In order to investigate the capacity to be pre- 
dicted, the variance between predicted indi- 
viduals was used as the numerator and the 
residual or interaction term as the denomi- 
nator. The resultant F ratio was significant 
beyond the .001 level. When the variance 
terms employed in the above F ratio were 
used to express the average correlation be- 


tween predictees, the coefficient of generality 
was .59. Since this coefficient was derived 
from the variance between predicted indi- 
viduals and the residual variance after the 
influence of group and predictor variance 
were removed, it is legitimate to interpret it 
as an expression of the expected results when 
all the predictees are predicted by predictors 
of the same ability to predict. 

When the variance within predictees was 
employed with the variance between pre- 
dictees to compute the average coefficient of 
generality, the resultant coefficient was .43. 

These results, like those which concern the 
generality of the ability to predict, must be 
interpreted only in the light of the amount 
of restriction placed on the variance esti- 
mates. There can be little doubt that there 
is some evidence for nonchance differences be- 
tween predictees in the capacity to be pre- 
dicted, regardless of the variances used in the 
computation. The average coefficients of .59 
and .43 indicate that the capacity to be pre- 
dicted may be considered a trait. 

Coefficients of correlation. Zero-order cor- 
relation coefficients with relation to the ad- 
justment, understanding, and understood dis- 
tributions were computed. 

A zero-order correlation coefficient of .11 
between the adjustment of the airman and 
his ability to understand others provides no 
evidence for a relation between adjustment 
and understanding. The correlation of .05 be- 
tween understanding and being understood 
indicates that understanding, whatever its na- 
ture, is independent both of the predictor’s 
adjustment and his capacity for being un- 
derstood. 

The correlation coefficient of .73 between 
the airmen’s adjustment scores and the mean 
of the understanding scores which group 
members attained using them as subjects, 
merits extended consideration. It would ap- 
pear from this coefficient that the better the 
adjustment of the predictee, the more easily 
he is understood. The viewpoint might be ad- 
vanced that the predictors tend to predict 
adjusted responses. Then, if the person for 
whom they are attempting to duplicate re- 
sponses is adjusted, the understood score will 
be high. If, however, the subject is malad- 
justed, the adjustment score will be low as 
will the level of understanding. However, an 
inspection of the distributions of prediction 
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scores, scored for adjustment, found the 
means of these distributions to be about the 
same as the means of the actual adjustment 
scores of the subjects. The scatter diagram 
of the adjustment-understood relationship 
showed that, while the great majority of 
deviations from a roughly estimated best-fit 
line were instances where the predictee was 
misunderstood and well adjusted, there were 
no instances where the predictee was well un- 
derstood and maladjusted. Support must be 
given, therefore, to a general thesis that the 
maladjusted are difficult to predict. The rea- 
son they are not understood is not because 
predictors refuse to predict maladjusted re- 
sponses. 

Another view of this finding may be found 
in the logical difference between predicting 
for an adjusted and a maladjusted individual. 
A predictor who believes he is predicting an 
adjusted subject marks the prediction report 
with the obvious adjusted answers. If he be- 
lieves his subject is maladjusted, he has the 
task of determining which of the areas are 
maladjusted, and which items represent those 
areas. This is evidently a difficult task. 


Conclusions 


All conclusions are made in the frame of 
reference of the present study. 

1. The ability to predict the responses of 
others may be regarded as a trait. 

2. The capacity to be predicted by others 
may be regarded as a trait. 

3. There is a positive relationship between 
the adjustment of a subject and the success 
with which others understand him. 


Summary 


The primary objectives of the experiment 
are (a) to determine the degree of generality 
in the ability to predict the responses of 
others on a preference-type inventory, (5) to 
ascertain the relationship of the adjustment 
of one individual to his ability to predict the 
responses of others, and (c) to determine the 
relationship of the adjustment of a person to 
the success with which he is predicted. A 
preliminary study had pointed out the neces- 
sity for an experimental design involving 
several prediction tasks for each subject. 


A revised and shortened Bell Adjustment 
Inventory was employed as 42 subjects, ar- 
ranged into six-man work groups, were re- 
quired to predict and to be predicted by the 
other five group members. 

Adjustment scores were obtained for each 
subject by utilizing the typical scoring key. 
Prediction scores were obtained by compar- 
ing, item by item, the predicted responses 
with the self report of the predicted subjects. 
These latter scores could be organized both 
in terms of the predicted scores which each 
subject obtained for others, and also by the 


‘predicted scores that were obtained by others 


for each subject. 

An analysis of variance revealed that the 
capacity to be understood provides more sub- 
stantial evidence for generality than the abil- 
ity to understand, but that there is evidence 
for both. Intercorrelations between the ad- 
justment of the subjects, their ability to un- 
derstand others, and their capacity to be 
understood resulted in only one marked re- 
lationship, that between the airmen’s adjust- 
ment scores and the mean of the understand- 
ing scores which group members attained 
using them as subjects. The thesis that the 
maladjusted are difficult to predict is sup- 
ported. 

It may be said in the frame of reference of 
the present study that (a) the ability to pre- 
dict the responses of others may be regarded 
as a trait, (5) the capacity to be predicted 
by others may be regarded as a trait, and (c) 
there is a positive relationship between the 
adjustment of a subject and the success with 
which others understand him. 


Received March 29, 1955. 
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Empathy, Similarity, and Self-Satisfaction' 
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Bronx VA Hospital 


The recent review by Taft (7) of studies 
dealing with the ability to judge people re- 
flects the current extensive interest and ex- 
perimentation in this area. One problem that 
has attracted particular attention is the ef- 
fect of similarity between predictor and pre- 
dictee on the accuracy of predictions. 

The issue is not a new one. In their experi- 
mental work on personality judgment, Wolf 
and Murray (8) noted that subjects were 
most accurate in predicting about people 
whose average ratings were most similar to 
their own and least accurate in making esti- 
mations about those whose average ratings 
were least similar. They thus raised a problem 
that is of utmost importance in the study of 
the empathic process. Do people empathize 
better with others who are relatively similar 
to themselves? If so, what is the role played 
by projection or attribution? ? Wolf and 
Murray were inclined to believe that projec- 
tion did not account for their result. They 
concluded, ‘The best explanation seems to be 
that man can only understand what he has 
already experienced. One might hazard the 
statement that without empathy a man can- 
not make an accurate diagnosis and he can 


1 This experiment was part of a doctoral research 
study at Teachers College, Columbia University (5). 
Grateful acknowledgments are extended to the au- 
thor’s dissertation committee, Professors Paul Eiserer, 
Edward J. Shoben, and Herbert Solomon. Dr. Gerald 
Bauman of the New York City Domestic Relations 
Court and Dr. Jacob Cohen, Chief, New York Train- 
ing Unit, Veterans Administration, aided with the 
design and statistical treatment of the data. Dean 
Margaret T. Shay of the Adelphi College School 
of Nursing graciously made her students available 
as subjects. 

2The term attribution rather than projection is 
preferred to avoid confusion with the psychoanalytic 
meaning of projection which refers only to the 
imputing of unconscious, unacceptable ideas to others. 
Attribution does not differentiate between con- 
sciously or unconsciously imputed characteristics. 


best empathize with those whose responses 
resemble his own” (8, p. 363). 

Bender and Hastorf (2) proposed an opera- 
tional definition of projection as the relation 
between a subject’s self-ratings and the rat- 
ings he attributes to others. From this they 
developed what they called a “refined empathy 
score” (3) by subtracting a subject’s projec- 
tion score from his total predictive accuracy. 
They felt that this refined score was a better 
measure of the subject’s empathic ability. 
They noted also that while the raw empathy 
score correlated significantly with similarity, 
the refined score did not. Nevertheless, they 
found cause for dissatisfaction with their 
method of attempting to tease out projection 
(or attribution) from over-all predictive ac- 
curacy. They stated, “We suggest the need 
for a methodology for differentiating between 
projection and empathy in these prediction 
tasks. The next step would seem to be to ob- 
tain predictions from individuals for a num- 
ber of their associates; these associates should 
differ in the amount of their similarity with 
the predictor. These data could be used to 
determine more closely the relationship be- 
tween similarity, projection, and empathy” 
(6, p. 576). 

One major aim of this study is the ex- 
ploration of the relationship between simi- 
larity and empathy. 

A second area of concern is the determina- 
tion of the relationship between a person’s 
self-satisfaction or self-dissatisfaction in a 
given personality area and his ability to pre- 
dict accurately in that area. 


Procedure 
Subjects 


The subjects were 38 female student nurses 
at the Adelphi College School of Nursing. For 
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at least two years they had been in close as- 
sociation with the other members of their 
group in a variety of academic, social, and 
professional situations. The college faculty, 
employing sociometric methods, had divided 
the students into four “maximally harmoni- 
ous” groups. Each group served a different 
hospital unit. 


Instrument 


With the aid of judges and a priori criteria, 
an instrument was devised consisting of 80 
items from the Guilford-Martin Inventory of 
Factors GAMIN (4) ® that were considered 
most suitable for use in a predictive empathy 
test. An equal number of items was selected 
from each of the five factors. 

Two further changes of the GAMIN in- 
ventory were made, one by omission and one 
by addition. The opportunity to respond to an 
item with a question mark, if a “Yes” or “No” 
answer could not easily be given, was elimi- 
nated, thus forcing the subjects to respond 
dichotomously. The second change involved 
asking the subjects to indicate whether they 
were “Pleased” or “Dissatisfied” with them- 
selves in the personality area tapped by each 
item by encircling a “P” or a “D” after 
answering the item. 


Administration 


Each group was seen twice. At the first 
session the subjects were asked to rate them- 
selves on the revised GAMIN inventory. Be- 
fore the groups were seen again, each subject’s 
similarity to each of the other members of 
her group was determined by tabulating the 
number of “Yes” or “No” items that they 
answered in the same way. Then, the two 
group members most Similar and most Dis- 
similar to each subject, plus a member be- 
tween these extremes, were selected to func- 
tion as predictees or referents for that subject. 

When the subjects were seen two weeks 
later they were instructed to predict how 
each of their five referents responded to the 
inventory during the first session. 


2(G) general pressure for overt activity, (A) 
ascendency as opposed to submission, (M) mascu- 
linity as opposed to femininity in attitudes and in- 
terests, (I) inferiority feelings, and (N) neurotic 
tendencies. 


Howard M. Halpern 


Hypotheses 


1. Subjects predict with greater accuracy 
about predictees who are similar to them than 
about those who are dissimilar to them. 

2. Subjects predict with greater accuracy 
on concordant items (items which they have 
answered in the same way as their predictees) 
than on nonconcordant items (items which 
they have answered differently from their 
predictees). 

3. On nonconcordant items there is greater 
accuracy in the predictions made about simi- 
lar referents than dissimilar referents. 

4. Ability to predict accurately on non- 
concordant items correlates positively with 
over-all predictive accuracy. 

5. On items where subjects have indicated 
that they are dissatisfied with their self-rat- 
ings they do not predict as accurately as on 
items where they have indicated that they 
are pleased. 

Results 


1. A ¢ ratio indicated that predictions for 
Similar referents were more accurate than for 
Dissimilar referents at a level of significance 
beyond .0O1 (¢ was 4.25 for 38 subjects). 
Furthermore, a highly significant 30 of the 
38 subjects were better predictors when put- 
ting themselves in the place of those they 
resembled. When mean similarity scores (the 
average number of items each subject an- 
swered in concordance with each of her five 
referents on the self-ratings) were correlated 
with mean predictive accuracy (the average 
number of correct predictions each subject 
made about her five referents) the resulting 
r was a significant .84. 

2. The second hypothesis stated that the 
subjects predict with greater accuracy on con- 
cordant items than on nonconcordant items. 
A t ratio of 12.81 supported this hypothesis. 

3. On. nonconcordant items, subjects did 
not predict significantly more accurately for 
Similar than for Dissimilar predictees (¢ was 
1.19). 

4. A nonsignificant correlation of .05 was 
found between a subject’s ability to predict 
accurately on nonconcordant items and her 
over-all predictive ability. 

5. The results of a ¢ test indicated that 
subjects predicted with greater accuracy on 
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“Pleased” items than “Dissatisfied” items at 
beyond the .001 level of significance.‘ 


Discussion 


The data reflect a clear positive relation- 
ship between an individual’s similarity to an 
acquaintance and his ability to make accurate 
predictions about him. Furthermore, predic- 
tive accuracy is greater when the individuals 
involved resemble each other in the specific 
areas of prediction than when they differ. 


Empathy or Attribution? 


What accounts for this ability of indi- 
viduals to prognosticate better about Similar 
as opposed to Dissimilar people and within 
shared rather than unshared areas of per- 
sonality? Empathy has been operationally de- 
fined in terms of predictive accuracy, but it 
is apparent that the above result could be a 
function of the attribution of a subject’s own 
traits to others rather than genuine sensitivity. 

If, in nonconcordant areas, the subjects 
had predicted more efficiently about Similar 
rather than Dissimilar acquaintances this 
greater accuracy could only be a function of 
true sensitivity to similar people. In unshared 
areas, attribution can lead only to error and 
therefore could have been ruled out as a 
facilitator of more accurate prediction. 

Such was not the case. Subjects did not 
predict significantly better for Similar as op- 
posed to Dissimilar referents in these un- 
shared areas. The increased over-all predic- 
tive ability found to exist when the predictor 
and predictee were similar was confined to 
the areas of their similarity. Once outside 
these concordant areas the predictors were 
just as much at sea when their total resem- 
blance to their referent was relatively great 
as when it was minimal. 

Because a person’s ability to empathize 
with similar people lies strictly within the 
boundaries of their similarity and disappears 
outside of these boundaries it might seem, at 
first glance, that an attributive mechanism 
must be at work. This is not necessarily so. 


4The complete data of this study may be ob- 
tained from University Microfilms, 313 No. First 
Street, Ann Arbor, Michigan. Refer to H. M. Hal- 
pern, Some factors involved in empathy, Publication 
No. 8677. 


What may instead be indicated is that people 
cannot effectively predict about what they have 
not phenomenologically experienced. The close 
relationship between similarity and predictive 
skill might simply mean that there is a greater 
likelihood that a person would recognize 
feelings and patterns of behavior in others 
if he has known similar feelings and patterns 
of behavior in himself. When an individual is 
confronted with the emotions and actions of 
another person that are alien to his own ex- 
perience, accurate recognition, and hence, ac- 
curate prediction, is little better than a chance 
matter. 

The degree of similarity between two peo- 
ple may therefore reflect the overlapping of 
their capacities to empathize with each other. 
The rationale for the finding that even two 
grossly similar people cannot predict at better 
than chance accuracy in the few areas of dis- 
similarity would then be that they were at- 
tempting to empathize where little capacity 
for empathy existed. 

Although the equation of similarity with 
capacity for empathy has rarely been so 
pointedly suggested by experimentation, it is 
not a new idea. In 1928 Adams wrote: 


. any experience or mental process in another 
organism can be inferred from structure, situation, 
history and behavior only when a similar experi- 
ence or mental process is or has been invariably as- 
sociated with similar structure, situation, history and 
behavior in oneself; and the probability of the in- 
ference wili be proportional to the degree of simi- 
larity (1, p. 252). 


Two alternative concepts, therefore, still 
remain as possibilities in explaining the find- 
ing that the greater predictive ability for 
similar people is restricted to the areas of 
their similarity. Either the concordant areas 
represent the presence of a capacity for mutual 
empathy, or are simply areas where attribu- 
tion will inevitably get good results. The 
methodology of this study has failed to dif- 
ferentiate the relative weight of these two 
factors. 

When the subjects’ ability to predict ac- 
curately on nonconcordant items was com- 
pared with their over-all predictive accuracy, 
no relationship was found. The question 
arises, which is the truer measure of empathy? 
Is it the ability to feel most extensively into 
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the largest number of people, or the ability 
to feel into those characteristics of others that 
differ from one’s own? 

Both arguments can be persuasive. It must 
certainly be granted that a measure which 
clearly eliminates attribution seems closer to 
the spirit of the term empathy. Yet, as we 
have seen, because a person accurately enters 
into the frame of reference of a similar person 
we cannot assume attribution nor rule out 
empathic sensitivity. Further, it would seem 
to be doing just as great an injustice to the 
meaning of the term empathy to designate as 
good empathizers people who cannot feel 
extensively into the phenomenological field of 
others but who can somehow predict about 
them relatively better in areas where they 
differ. 


Empathy and Self-Satisfaction 


The subjects predicted with significantly 
greater accuracy on items where they had 
indicated self-satisfaction rather than dis- 
satisfaction. One possible rationale for this 
finding is that in areas where a person is dis- 
content about his own behavior, disorganizing 
anxiety may be aroused and distortive de- 
fenses may be mobilized. Both of these factors 
may function to cause aberrations in accurate 
interpersonal perceptions. 


Who Is the Good Empathizer? 


The close relationship that was found be- 
tween similarity and empathy, particularly if 
similarity is considered a measure of the 
empathic potential existent in the situation, 
leads to inferences about the personality of 
the good empathizer. First of all, the phe- 
nomenological experiences of the good em- 
pathizer could not be drastically deviant from 
those of his reference group. He has, all 
things being equal, capacity for empathy with 
more people if he is near the center of the 
range on a given characteristic. Secondly, the 
wider his phenomenological experience, in 
terms of its breadth, fullness and richness, the 
more people will he be able to encompass, 
through similarity, in his empathic scope. 

The relationship between self-satisfaction 
and empathy suggests that a person who is at 
home with most of his own behavior is likely 


to be a better empathizer than those who are 
largely dissatisfied with themselves. 


Summary 


The ability of 38 student nurses, in four 
groups, to predict each other’s responses to a 
personality inventory was determined. This 
ability, which served as an operational defini- 
tion of empathy, was found to be positively 
correlated with (a) the similarity of predictor 
and predictee, and (6) the predictor’s satis- 
faction with her own behavior in the area of 
prediction. 

The methodology of this study failed to 
resolve whether the ability to predict more 
accurately for similar predictees was a product 
of attribution or true sensitivity. However, 
it is proposed that the factor of similarity 
may not be an artifact of the predictive 
method, as other experimenters have sug- 
gested, but that similarity may be a vital part 
of the empathic process. It may be that 
people can most readily recognize in others 
what they have experienced, on some level, 
in themselves. 


Received May 2, 1955. 
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Cultural Symbolism: A Validity Study 


Emanuel Starer 
Veterans Administration Hospital, Coatesville, Pa. 


Levy (1) reported that the results of his 
experiment with 62 normal children in the 
fifth grade of a public school did not support 
the hypothesis that “. . . certain environ- 
mental objects by virtue of their structural 
characteristics may be taken as being sym- 
bolic of the male and female genitalia” (1, 
p. 45). 

The purpose of the present study is to in- 
vestigate this same hypothesis that “where 
abstract objects are categorized by a given 
means as either male or female, those objects 
which may be described as elongated, pointed, 
or angular will be categorized as male; those 
objects which may be described as round and 
containing in nature will be categorized as 
female” (1, p. 43). 


Method 


The Ss in this experiment consisted of 64 
male psychotic patients and 48 female psy- 
chotic patients in varying degrees of remis- 
sion. The age range of the Ss in the male 
group and female group was 20-48 and 24— 
50, respectively. The primary diagnosis in 
98 per cent of the psychotic population was 
schizophrenic reaction. Another population 
consisting of 30 student nurses with a mean 
age of 19 and considered to be clinically nor- 
mal was also tested. 

The Ss were presented with ten cards con- 
taining figures which were presented in ran- 
dom order. Five of the figures were elongated 
or pointed and were assumed to be symbolic 
of male characteristics. Five figures were 
rounded and containing and symbolic of fe- 
male characteristics. Ten additional cards 
containing the names of five males (Jack, 
Thomas, Sam, James, and Harry) and five 
females (Mary, Betty, Irene, Sally, Barbara) 
were given to each subject. These were also 
arranged in a randomized manner. Each sub- 


1 Copies of the figures are available on request 
from the author. 


ject was seen individually for approximately 
ten minutes and was told to “place the names 
of these five men and women on these cards 
(containing the 5 male and 5 female figures) 
wherever you feel they belong.” After this 
was done and the results recorded, the Ss 
were asked to tell the examiner what each 
figure looked like to them. 


Results 


As in Levy’s experiment the presented task 
requests a simple sorting problem. By calcu- 
lating the expected probability of getting any 
given number of “correct” sorts and compar- 
ing it with the obtained frequencies by means 


Table 1 


Predicted and Observed Frequency Distributions for 
Any Given Number of Correct Matches in the 
Male Psychotic Population (V = 64) 


Number of 


correct Predicted Observed 
matches p frequency frequency 

0 001 06 0 

2 10 6.40 1 

+ 39 24.96 13 

6 39 24.96 18 

8 10 6.40 23 

10 001 06 9 

Table 2 


Predicted and Observed Frequency Distributions for 
Any Given Number of Correct Matches in the 
Female Psychotic Population (NV = 48) 


Number of 


correct Predicted Observed 
matches p frequency frequency 

0 001 05 0 

2 10 4.80 0 

+ 39 18.72 5 

6 39 18.72 21 

8 10 4.80 17 

10 001 05 5 
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Table 3 


Predicted and Observed Frequency Distributions for 
Any Given Number of Correct Matches in the 
Student Nurse Population (V = 30) 


Number of 

correct Predicted Observed 
matches p frequency frequency 

0 001 03 0 

2 10 3.00 1 

4 39 11.70 2 

6 39 11.70 8 

8 10 3.00 9 

10 001 03 10 


of chi square, it is possible to test the null 
hypothesis.* The results are presented in 
Tables 1, 2, and 3. 

The chi square for Table 1 is 113.17 with 
three degrees of freedom. The result is sig- 
nificant beyond the .001 level of significance. 
Chi square for Table 2 with three degrees of 
freedom is 75.78 and is significant beyond 
the .001 level of significance. Chi square for 
Table 3 with three degrees of freedom is 
94.73. This result is significant beyond the 
.001 level of significance. 

Table 4 contains the most frequent asso- 
ciations to the presented figures given by the 
three groups of subjects. 


Discussion 


The results of this experiment are contrary 
to that obtained by Levy. It is possible that 


Table 4 


Most Frequent Associations to the Presented Figures 
in the Male and Female Psychotic Population 
and Student Nurse Population 


Figure Association 


Pencil, bullet 
Lips, mouth 
Gun stock, or rifle 
Side view of a canoe, stick, branch, belt 
Pen, pencil 
Hat, bowl 
Hills, mountains, letter ‘‘m”’ 
Bowl, cup 
Cigarette 
Circular form or frame, letter ‘‘o,” ring 


- 


2In calculating x*, the two extreme categories at 
either end were combined because of the low pre- 
dicted frequencies. 


some of the discrepancy may be explained 
on the basis that distinct differences exist in 
the type of subject population. Again, in the 
present experiment, a different set of stimuli 
was employed. However, inspection of the 
two sets of stimuli suggests that they are 
substantially similar. Another factor is the 
possibility that Levy used group testing, al- 
though this is not absolutely clear from his 
report, and only individual testing was used 
in the present experiment. 

Before we can dismiss the existence of a 
cultural sexual symbolism, more extensive in- 
vestigation is needed. Certainly, the results 
of the present experiment point to the strong 
possibility that there is such a phenomenon 
as a symbolism which is generally accepted 
in any particular culture. 

An interesting observation made by the ex- 
aminer, although statistical proof cannot as 
yet be presented, was that the patients who 
were relatively more confused and disorgan- 
ized made many more errors in matching 
than the patients who seemed in fairly good 
states of remission. The observation raises 
the possibility that one source of illness is 
the fact that there never has been a com- 
plete acceptance on the part of the very dis- 
turbed patient of the generally prevailing 
symbolism in a given culture. Another pos- 
sibility is that there has been a regression as 
a result of emotional illness to a level where 
cultural symbolism does not yet play an 
effective role for the personality. This ap- 
pears to warrant further investigation. 


Summary 


The results of the present investigation ap- 
pear to support the hypothesis that there may 
be a symbolism in any particular culture 
which is generally accepted by those who are 
functioning relatively effectively. The possi- 
bility is raised that the inability of certain 
individuals to adapt to and accept cultural 
symbolism results in emotional disturbance. 


Received May 6, 1955. 
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Temporal Acuity of Vision, Audition, and Touch 
in Psychogenic and Neurogenic Pathology’ 


Albert F. Ax 


University of Washington 


and William H. Colley 
VA Mental Hygiene Clinic, Huntington, West Virginia 


Critical fusion frequency, as Geldard (6, 
p. 89) suggests, may be considered as a meas- 
ure of visual “temporal acuity.” Temporal 
acuity is the ability to resolve two or more 
discrete stimuli appearing close together in 
time. As the time is decreased between recur- 
ring stimuli, the critical frequency is finally 
reached at which the separate stimuli lose 
their discreteness with a noticeable and re- 
portable change in the quality of the sensa- 
tion. For visual stimuli at frequencies above 
the critical flicker fusion point, the sensation 
is a steady light. Below the CFF frequency 
there is a sensation of flicker. The many 
studies of visual temporal acuity (CFF) 
have recently been reviewed by Hecht and 
Verrijp (8) and by Simonson and Brozek 
(14). 

Temporal acuity for other sense modalities 
has not been so extensively studied. In dis- 
cussing “auditory flutter,’ Geldard reports 
the phenomenon not to be comparable to 
visual critical flicker fusion, since “if the 
tonal .cutoff is really effective the ‘critical 
fusion point’ is never reached . . . [it] is in 
no sense the ‘fused’ continuity of the kind 
found in vision above the critical frequency 
of flicker” (6, p. 137). It is clear, from the 
context, Geldard is oriented in this discussion 
to the nature of the sensation above the criti- 
cal point as related to problems of neural 
persistence. If, on the other hand, the aspect 
of “temporal acuity” is considered, the in- 

1From the Department of Psychiatry, University 


of Washington School of Medicine and the Veter- 
ans Administration Hospital, Seattle, Washington. 


vestigation can be focused on whether sub- 
jects can consistently report the point at 
which the sensation of separate stimuli is 
lost (with rising frequency), or when the 
sensation of discreteness reappears with de- 
creasing frequency. 

Temporal acuity of the skin to tactual and 
electric stimuli has been largely neglected in 
the long discussions of the “vibratory sense” 
which Geldard (5) has thoroughly reviewed. 
He does, however, report a few findings on 
the lower frequency limit for the “vibratory 
sense,” that is, the lowest frequency at which 
separate impacts are felt as a “vibration” or 
separate shocks as a “tingle.” Knudsen (10) 
found the lower limit to be from 12.5 to 
18.00 per second for shocks with an average 
of 15.3 for two observers. Kampik (9) and 
Brecher (4) believed it to be 16 cycles per 
second for tactual impacts. 

These studies indicate that temporal acuity 
for the tactual sense can be identified. It is 
clearly recognized, however, that as with the 
auditory temporal acuity, when the stimulus 
frequency is above the critical value, the ex- 
perience is quite different from the smooth, 
continuous sensation of the steady-appearing 
light when above the CFF frequency. 

Since the various sense modalities produce 
a variety of experiences when stimulated at 
frequencies above the temporal acuity thresh- 
old, it is essential to define the temporal 
acuity threshold by the uniform referent of 
“loss of discreteness.” Seeing, hearing, or feel- 
ing separate, discrete stimuli is a general 
experience, not dependent on the particular 
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sense modality. Accordingly, in this experi- 
ment the temporal acuity threshold is defined 
as the mean frequency at which the experi- 
ence of discreteness is lost with increasing 
frequencies and reappears with decreasing 
frequencies. 

Critical fusion frequency of vision (CFF) 
has repeatedly been found to be lower in 
various pathological conditions of the central 
nervous system produced by toxins, anoxia, 
and physical damage to neural tissue. Bat- 
tersby, Bender, and Teuber (2) recently re- 
ported on a study of CFF in patients with 
long-standing lesions of either the frontal or 
occipital lobes. On the basis of their findings, 
they concluded that transient depressions in 
visual FF may occur after frontal lobe le- 
sions, but that injuries to higher visual path- 
ways are necessary to produce a lasting re- 
duction in CFF. They further hypothesized 
that transient reductions of CFF might ap- 
pear after cortical ablation of other nonvisual 
areas involving, for example, the temporal or 
parietal lobes. 

If this view is correct, the usefulness of 
visual CFF alone, as a device for localizing 
brain pathology, would be limited to state- 
ments of the absence or presence of an acute 
or new pathology of unknown position in the 
cortex, or the presence of an old lesion in 
higher visual pathways. In any given pa- 
tient, which of these is the case could hardly 
be specified with much confidence on the ba- 
sis of visual CFF data alone. Were it pos- 
sible to demonstrate that lasting depressions 
of temporal acuity in other sense modalities 
follow involvement of their higher pathways 
and projection areas, localization of old le- 
sions could be more precise. As in vision, a 
new or acute pathological condition of any 
lobe might still result in depressed temporal 
acuity in other modalities, but the data ob- 
tained from their simultaneous study might 
reveal different degrees of depression and 
make possible localization of pathology while 
it is still new or in an acute phase. 

Halstead (7) has suggested a _ central 
mechanism of the cortex which limits the re- 
solving power of the primate visual system. 
On the other hand, there is the possibility 
that each of the senses has its own integrat- 
ing mechanism or center governing its tem- 


poral resolving power which would permit the 
temporal acuity of the various senses to be 
independent of each other, so long as lesions 
are not disruptive of the general physiologi- 
cal economy of the total cortex. Where a new 
or acute disturbance affects the total cortex, 
temporal acuity of all modalities may be de- 
pressed, giving the appearance of a central 
fusion mechanism. 

The present investigation seeks to deter- 
mine, first, whether temporal acuity can be 
reliably measured for auditory and tactual 
sense modalities and secondly, whether this 
extension to other sense modalities will pro- 
vide an increase in power for the diagnosis 
and localization of brain pathology. A third 
question as to whether the fusion mechanism 
is a general function of the cortex as a whole 
or whether it is specific to the sense modality 
will be explored, but it is unlikely to be 
clearly answered by this study. 


Subjects 


All subjects were male patients at the Vet- 
erans Administration Hospital, Seattle, Wash- 
ington. Acceptability was contingent upon be- 
ing ambulatory, capable of understanding and 
following instructions, and remaininig hos- 
pitalized for a period permitting careful medi- 
cal evaluation. 

The neuropathology group consisted of 22 
patients. The criteria for neuropathology were 
positive neurological signs, abnormal electro- 
encephalograms and, where available, positive 
X ray and neurosurgeons’ reports. Except as 
implied above, the subjects were unselected 
as to extent or location of pathology. Final 
diagnoses revealed three patients with frontal 
lobe lesions, two with parietal lobe lesions, 
two with temporal lobe lesions, one with a 
cerebellar lesion, three with subcortical le- 
sions, and eight with generalized cortical 
pathology. Of the remaining three, lesions 
were found in the occipital, temporal, and 
parietal lobes of one, in the temporal and 
frontal lobes of the second, and in the tem- 
poral and occipital lobes of the third. Anti- 
convulsant and/or sedative drugs were being 
administered to 10 of the group on the date 
of testing. Ages ranged from 22 to 62 years, 
with a mean age of 40.4 years. Cerebrovascu- 
lar accident patients, by virtue of their gen- 
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erally higher age and ease of diagnosis, were 
eliminated from consideration. 
The psychopathology group of 21 consisted 
of five patients initially suspected, but not 
subsequently diagnosed as having brain pa- 
thology, and 16 patients not suspected or 
found to have neuropathology during their 
medical evaluation. Selected at random from 
the psychiatric wards, 11 were classified as 
psychoneurotics (three conversion reactions 
and eight other or mixed subtypes), seven as 
psychotic (two psychotic depressions and five 
schizophrenic reactions), and three as char- 
acter disorders. Anticonvulsant and/or seda- 
tive drugs were being administered to four of 
the group when tested. Ages ranged from 25 
to 62 years, with a mean age of 37.7 years. 


Apparatus 


The intermittent stimuli were produced by 
a General Radio Strobotac (type 631-B) 
equipped with amplifier, speaker, tactual 
stimulator, and neon tube. This oscillator had 
two pulse frequency ranges of 10 to 60 cycles 
per second for the low range, and 40 to 420 
cycles per second for the high range. 

For the visual stimuli, a type NE 40 neon 
tube was plugged directly into the “strobolux” 
output jack, which provided a brief, high 
voltage pulse of less than 30 microseconds’ 
duration. The NE 40 tube was painted black 
except for a $-inch circular window directly 
over the glow plate. The tube was viewed 
through a light-tight black box, the light 
source being 40 inches from the subject’s 
eyes. 

Tactual stimuli were produced by a bamboo 
stylus projecting vertically through an ad- 
justable washer mounted on a horizontal sur- 
face of convenient height for the subject to 
rest his hand. The stylus was operated by a 
relay modified to have approximately one 
millimeter stroke. Since the touch vibrator 
could not be made completely noiseless, a 
masking sound was produced (during touch 
only) by a 60-cycle hum through earphones 
from the 6.3-a.c. heater supply of the Stro- 
botac. The hole in the washer on which the 
subject rested his finger was very small 
(about 4 inch in diameter) which prevented 
even large variations in finger pressure to 
dampen or impede the stroke of the stylus. 


In all cases, the stylus was felt to continue to 
vibrate at all frequencies used. 

Auditory and electric stimuli were provided 
by a condensor-coupled cathode follower am- 
plifier. The rectified output of this amplifier 
was switched either to a 3-inch speaker via a 
25,000-ohm output transformer, or to two }- 
inch silver electrodes attached to the volar 
surfaces of two fingers on one hand with elec- 
trode jelly and tape. The sound produced was 
a clean-cut “tick.” The shock was also a 
clean-cut, discrete “jolt.” A volume control 
provided an intensity range from below 
threshold to an intensity well above threshold 
for all subjects for both auditory and shock 
stimulation. 


Procedure 


The temporal acuity thresholds for the four 
types of stimulation were determined in the 
following order: (a) vision, (4) audition, 
(c) touch, and (d) electric. Twenty alternate 
up and down trials were given for each 
modality. 

The first starting frequency was 10 cycles 
per second and thereafter randomly varied 
from 10 to 20 cycles per second above and 
below the previous threshold. Rate of change 
was one cycle per second per second. 

Instructions were similar for all tests. For 
example, the auditory temporal acuity thresh- 
old (ATA) instructions were: “Can you hear 
this ticking sound? You can hear each tick 
separately, but as I speed it up they will 
reach a rate so fast you cannot hear the 
separate ticks, but rather, the sound will be- 
come a hum or buzz. Say ‘now’ when you 
lose track of the separate ticks.” (The EZ 
demonstrated.) “Now as I slow it down, the 
separate ticks will come back. Tell me when 
you hear the separate ticks again.” 

The instructions for visual (VTA), touch 
(TTA), and electric (ETA) temporal acuity 
frequency thresholds were similar. In all cases 
the disappearance and reappearance of the 
intermittent aspect of the phenomenon were 
indicated as the focus of attention, although 
the natures of the sensations produced by 
frequencies above the thresholds were named, 
such as hum or buzz for audition, vibration 
for touch, and “tingle” for electric. The shock 
current was set at 1.5 times the mean of three 
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Table 1 


Temporal Acuity Values for Four Stimulus Modalities 
Average of Up Plus Down Trials 


Vision Audition Touch Electric 

Group x Xx Si x 
PP 31.30 21.51 68.63 1,041.33 34.62 31.29 33.73 29.56 
NP 28.59 20.88 53.56 853.19 32.20 78.14 30.94 48.71 
Diff. 2.71 15.07 2.42 2.79 

F 2.50 1.65 
t 2.73 2.27 1.08 2.07 

Prob. <.01 <.05 >.10 02 <.05 10 


rising threshold determinations. Because of 
sensory adaption or rising resistance, it was 
found necessary to redetermine the threshold 
every fifth trial. Most subjects reported this 
shock intensity of 150 per cent of threshold 
to be clearly perceptible, but not uncom- 
fortable. 

Except for the shocks, the stimuli were set 
at a constant intensity, as is customary in 
CFF studies. It seems likely that greater 
reliability of temporal acuity measurement 
could be made if the stimulus intensity were 
adjusted to some standard decibel level above 
the subject’s threshold. 


Results 


The customary score for flicker fusion is the 
mean frequency for the up and down trials 
averaged together. In Table 1 these scores 
are tabulated for each of the four stimulus 
modalities. On the first line are the means 
and variances (computed with N-1) for the 
psychopathology (PP) group of 21, and just 
below, for the neuropathology (NP) group 


of 22. The mean on visual temporal acuity 
(VTA) for the psychopathology group is 
31.30, and for the neuropathology group, 
28.59 cycles per second. The average differ- 
ence of 2.71 cycles per second has a ¢ value 
of 2.73 and a null probability of less than 1 
per cent. In the next column for auditory 
temporal acuity (ATA), the difference is 
15.07 cycles per second, but because of the 
extremely large variances, the null significance 
level falls between 1 per cent and 5 per cent. 
The touch temporal acuity (TTA) does not 
differ significantly for the two groups, although 
the mean is lower for the neuropathology 
group. The variance is 2.5 times as large for 
this group, with an F ratio significant at the 
2 per cent level. The mean electrical temporal 
acuity (ETA) for the neuropathology group 
is 2.79 cycles per second lower, a difference 
significant at the .05 level. Here again, the 
variance is considerably larger for the pathol- 
ogy group although the F ratio is not sig- 
nificant. 

In order to check whether the up and down 


Table 2 


Errors of Habituation for Four Stimulus Modalities 
Average of Up Minus Down Trials 


Vision Audition Touch Electric 

PP —1.55 3.64 13.47 397.94 1.11 57.41 .21 27.42 
NP 13 4.79 17.53 599.01 6.89 60.44 4.12 49.55 
Diff. —1.68 —4.06 —5.78 —3.91 

F 2.03 
t 3.80 .26 3.59 2.81 

Prob <.01 >.10 <.01 <.01 06 
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Table 3 
Average of Up Trials 


Vision Audition Touch Electric 
PP 30.52 19.72 1.20 75.36 1,644.82 110.95 35.17 34.71 13.30 34.84 27.21 15.89 
NP 28.66 16.40 1.57 62.32 1,443.35 383.45 35.65 95.45 20.75 33.00 63.20 30.70 
Diff. 1.86 13.04 —.52 84 
F 2.76 2.33 
t 2.03 1.54 .27 61 
Prob. <.05 <.10 >.10 05 >.10 05 


trials might be contributing differentially to 
these group differences, the up and down 
trials are averaged separately and posted in 
Tables 3 and 4, respectively. It is immediately 
apparent that the down trials are contributing 
most of the difference, since the down trial 
means are significantly lower for the neuro- 
pathology group in all stimulus modalities 
than are the up trials. Only the VTA has a 
significant difference on the up trials at the 
.05 level. 

In order to isolate this factor which is 
contributing to the down trials over and 
above the temporal acuity level itself, as 
measured by the mean of the ups plus downs, 
the differences were taken between the ups 
and downs which are listed in Table 2. All 
but vision are heavily loaded with what may 
be described as an “overshoot” factor. This 
“overshoot” tendency has been described by 
Underwood (15) as “error of habituation,” 
and when negative, as the “error of expecta- 
tion.” The neuropathology group tends to 
“overshoot” their temporal acuity level; that 
is, they go higher on the up trials and lower 


on the down trials. In the visual modality the 
psychogenic group has a mean “error of 
expectation” of —1.55 while the neurogenic 
group has an almost zero mean error of .13. 

Each subject’s individual variances from 
trial to trial are tabulated in the columns of 
Tables 3 and 4 headed as “S,* with the up 
and down trials separate. With one exception, 
that is, the auditory down trials, these indi- 
vidual variability scores average higher for 
the neuropathology group. However, the vari- 
ability scores themselves have such large vari- 
ances in both groups as to make the ¢ tests 
of the differences between groups not sig- 
nificant. 

The correlations in Table 5 show an in- 
teresting trend. On the measures of temporal 
acuity, the neuropathology group have very 
small correlations between the four modalities, 
all being insignificant except that between 
touch and electric. This high correlation be- 
tween the touch and electric stimulus mo- 
dalities suggests these two modalities ap- 
parently involve the same sensory projection 
areas. In contrast to the low average inter- 


Table 4 
Average of Down Trials 


Vision Audition Touch Electric 

Group BA, x S2 S?2 X SZ 
PP 32.07 27.20 1.41 61.89 1,130.32 106.64 34.06 55.52 16.19 33.63 41.64 19.50 
NP 28.53 27.26 1.75 44.79 563.66 79.13 28.76 90.60 22.53 28.88 59.39 29.46 
Diff. 3.54 17.10 5.30 4.75 

F 2.00 

t 3.15 2.72 2.88 3.10 

Prob. <.01 <.01 06 <.01 <.01 
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Table 5 


Intercorrelations* Between Temporal Acuity Measures of Four Stimulus Modalities 


Combined 

Psychopathology Neuropathology groups 

N = 21 N = 22 N = 43 

Variables U+D U-D Up _ Down U+D U-—-D Up _ Down Down 
VA 56 28 56 53 00 31 01 09 40 
VT 57 32 57 53 09 56 09 15 38 
VI 55 15 75 46 02 49 00 13 35 
AT 57 54 54 60 26 76 42 24 46 
Al 57 47 55 56 41 44 51 37 50 
TE 75 85 75 78 55 85 56 67 74 
Mean r 60 48 63 59 24 60 28 29 48 


(via 2’) 


* Decimal points omitted. 


correlation of .24 for the neuropathology 
group, the TA modalities are well correlated 
for the psychopathology group, averaging .60. 
This finding of contrasting intercorrelations 
for the two groups suggests the possibility 
that specificity of the brain lesion is revealed 
by specificity of temporal acuity in that 
sensory modality mediated by the particular 
brain area damaged. The intercorrelations for 
the up minus down trials appear to show an 
opposite trend. For these “habituation” scores, 
the neuropathology group have an average 
correlation of .60 contrasted to .48 for the 
psychopathology group. “Habituation” as an 
attitude or personality characteristic would 
be expected to be a generalized effect and not 
specific to the modality. The various mo- 
dalities may vary, of course, in their effective- 
ness as a reliable measure of the “habituation” 
characteristic. 

An attempt was made to examine more 
directly the relationship between the locus of 
lesion and the modality most affected by 
grouping the subjects into frontal, parietal, 
temporal, and occipital lobe lesions. No rela- 
tionship, however, was apparent, which may 
be a function of the small size of the sub- 
groups and the lack of certainty of the precise 
localization for many. 

There were five cases with known unilateral 
lesions. These were tested on both hands for 
touch and electricity. There was no trend for 
the temporal acuity threshold to be related to 
laterality of lesion. 


Finally, the effect of sedatives or anticon- 
vulsants, which some of the patients were 
taking, was related to temporal acuity. Of the 
neuropathology group, 10 were receiving drugs 
and 12 were not. The medicated group had 
higher temporal acuity thresholds on all sense 
modalities. The “error of habituation” was 
less for the medicated group. Since only four 
(19%) of the psychopathology group were 
receiving drugs, in contrast to 10 (45%) of 
the neuropathology group, these findings in- 
dicate that not only could the drugs not 
account for our general findings of lower 
temporal acuity thresholds and higher habitu- 
ation errors for the neuropathology group, 
but, rather, the drug effect may have at- 
tenuated the differences found. 


Conclusions 


1. These results indicate that temporal 
acuity thresholds for audition and touch can 
be measured with sufficient reliability to be 
useful for the diagnosis of cerebral pathology. 
The best single indicator is the vision “error 
of habituation” which correctly diagnosed 33 
cases (76%), using the median as the cut-off 
point. Vision down trials were almost as good. 
When the touch variability scores were com- 
bined with these two, all three being equally 
weighted, 35 cases (81%) were correctly 
classified with three psychopathology and 
five neuropathology cases being misclassified. 
Hence, for these samples there is gain im 
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diagnostic power by combining several sense 
modality acuity thresholds and “errors of 
habituation” into a battery. 

2. The question as to whether fusion fre- 
quency is a generalized function of the brain 
as a whole, or specific to each sense modality, 
remains obscure. While the lower intervariable 
correlations for the pathology group argue for 
specificity, our available samples with specific 
localized lesions are too small to draw any 
satisfactory conclusion. 

3. The finding that the “error of habitua- 
tion” score is reflecting a characteristic highly 
diagnostic of cerebral pathology suggests that 
other psychophysical measures, not involving 
temporal acuity, should be investigated as 
measures of this perseverative tendency often 
seen in patients with cerebral pathology. 


Received May 2, 1955. 
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Social Desirability and Q Sorts 


Allen L. Edwards 


The University of Washington 


In Q technique, as used in clinical and per- 
sonality research, subjects are often asked to 
describe themselves in terms of a set of per- 
sonality items. If these items differ with re- 
spect to their social desirability scale values, 
then social desirability may function as an 
uncontrolled variable in the Q sorts of the 
subjects. The potential influence of social de- 
sirability on the statistical analysis of Q sorts 
has been described in some detail by Edwards 
and Horst (2). 

Edwards and Horst assumed that social de- 
sirability operated in such a way that the 
weights assigned to the items by subjects 
would be an increasing function of the social 
desirability scale values of the items. This as- 
sumption was based upon research by Ed- 
wards (1). He found that the probability of 
endorsement of an item in a personality in- 
ventory was an increasing linear function of 
the social desirability scale values of the 
items. No evidence was presented, however, 
to indicate that responses of subjects to items 
in Q sorts would be influenced in the same 
manner by differences in social desirability 
values of the items as was found in the case 
of responses to a personality inventory. The 
present study was undertaken to provide evi- 
dence upon this point. 

Subjects consisted of a group of 50 men 
and a group of 50 women students at the 
University of Washington and at Oregon 
State College. Q sorts for the Oregon students 
were made available by Matthew J. Trippe. 
The procedure described by Stephenson (3, 
pp. 59-61) was followed in obtaining the Q 
sorts. Each subject gave a description of him- 
self in terms of the 135 personality items 
used in the earlier study by Edwards. Eleven 


categories, ranging from most characteristic 
to least characteristic, were used by the sub- 
jects in making their self-descriptions. The 
frequencies for each category were fixed to 
resemble a somewhat flattened normal dis- 
tribution. The number of items to be placed 
in the successive categories was 5, 7, 8, 14, 
20, 27, 20, 14, 8, 7, and 5. The weights as- 
signed to the categories consisted of the 
integers 1 to 11, with the 11 weight being 
assigned to the most descriptive category. 

For each group of 50 subjects the mean 
weight assigned to each of the 135 items was 
found. These mean weights were then corre- 
lated with the social desirability scale values 
as determined previously by Edwards. The 
product-moment correlation for the male 
group was .84 and for the female group it 
was .87. If these results are typical of those 
to be expected in Q studies involving self de- 
scriptions in terms of personality items dif- 
fering in social desirability values, then one 
might predict fairly successfully, on the aver- 
age, the Q sorts of subjects, provided only 
that we have available the social desirability 
scale values of the items used in obtaining 
the Q sorts. 
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The Reliability of Adjustment Ratings 
and the Length of Case Histories 


A. W. Bendig 


University of Pittsburgh . 


Several methodological questions in the use 
of rating scales to quantify clinical judgments 
have been investigated in recent research (3, 
4). Bendig and Sprague (4) had ten short 
clinical case histories, culled from psychology 
textbooks, rated for adjustment level by four 
groups of Ss: introductory psychology stu- 
dents, students of abnormal psychology, grad- 
uate students in psychology, and post-Ph.D. 
clinical psychologists. The first two groups 
were subdivided into groups using rating 
scales differing in length: 3, 5, 7, or 9 cate- 
gories. A progressive increase in rater relia- 
bility (average interjudge correlation) with 
increasing educational background was found. 
Also noted was a strong positive relation be- 
tween scale length and rater reliability for 
the introductory psychology Ss, but this re- 
lationship was not as evident for the abnormal 
psychology Ss. In the second study, Bendig 
(3) used the longer case histories abstracted 
by Cummings (5) from clinical files. Two 
groups of Ss, introductory educational psy- 
chology students and graduate students in 
psychology, rated the 10 cases for adjust- 
ment level using a 7-category scale. In addi- 
tion, each S indicated which case history sec- 
tion was most important in determining his 
rating of the case. The groups were compared 
on the magnitude of several types of rating 
errors shown and on their use of various case 
sections. 

The most obvious difference in the results 
of the two studies (3, 4) was in the magni- 
tudes of rater reliability reported. Bendig and 
Sprague reported (4, p. 208) that the average 
intercorrelation among the introductory psy- 
chology Ss using a 7-cutegory scale was .47 
(N = 21) while the reliability for the psy- 


chology graduate students (VN = 16) was .59. 
The comparable reliabilities for the two simi- 
lar groups in the second report (3) were .82 
and .84. Changing the rated stimuli (case his- 
tories) not only increased the size of the rater 
reliabilities, but also tended to reduce the 
difference in reliability of these educationally 
distinct groups of judges. One hypothesis to 
explain these differences is that longer case 
histories (reporting more of the clients’ be- 
havior) would be rated more reliably, since 
the first group of cases were quite brief (ap- 
proximately 100 words each) while the second 
set were much longer (one to two pages of 
single-spaced typing each). A second hypothe- 
sis suggests that the second set of cases, drawn 
from clinic files, covers a wider range of ad- 
justment, or is more heterogeneous on the 
rated continuum (adjustment level) than is 
the first set, which was abstracted from text- 
books. It has been shown (1, 2) that more 
heterogeneous stimuli are rated more reliably 
by introductory psychology Ss. 

The present research was designed to test 
this first hypothesis. The case histories in the 
second set were systematically shortened until 
they approximated the length of the first set. 
Both the “long” and “short” versions of the 
same cases were rated by different groups of 
introductory psychology Ss which were fur- 
ther subdivided into groups differing in the 
length of the rating scale used. This latter 
step was designed to test for the appearance 
of the positive relation between scale length 
and rater reliability previously reported (4). 


Procedure 


Stimuli. Two sets of ten case histories each 
were used as stimuli to be rated. The “long” 


463 


ae 
| 
at 
by 
4 
| 
» 
| 
| 
a 
iy 
if 


464 A. W. Bendig 


set consisted of the case histories originally 
compiled by Cummings (5) and later used by 
Newton (9) and Bendig (3). Each case had 
been abstracted from the complete clinic file, 
excluding psychometric information, with the 
S’s behavior reported under seven areas (3). 
The cases varied in length from one to two 
pages of single-spaced typing. In a previous 
study (3) both undergraduate psychology and 
graduate psychology students had rated these 
cases for adjustment level and had also in- 
dicated which of the seven case sections was 
most important in determining the adjustment 
rating they gave the case. The case sections 
checked by one-third or more of the raters 
in either group as being important have been 
reported (3). The “short” set of case his- 
tories used in the present study consisted of 
only the sections judged by either of the two 
previous groups as being important. The “un- 
important” sections for each case were omitted 
in typing the cases for presentation to the 
raters. Each case in the “short” set was about 
one-half page of single-spaced typing, or ap- 
proximately one-half to one-third its length in 
the long set. The cases of either the long or 
short set were mimeographed as a booklet 
with a face-sheet of rater instructions request- 
ing the judge to read and rate each case for 
adjustment level, using a rating scale pro- 
vided on a separate card. 

Scales. Three different rating scales, differ- 
ing in the number of categories on the scale 
(scale length), were mimeographed on cards 
with spaces for recording the judge’s name, 
sex, age, and the adjustment rating given 
each case. The scales had either five, seven, 
or nine categories with the lowest category 
given a numerical weight of one and the 
highest category a weight of five, seven, or 
nine depending on scale length. Intermediate 
categories were given appropriate unit weights. 
The lowest category was verbally anchored by 
the descriptive phrase “Slightly maladjusted,” 
the center category by the phrase “Moderately 
maladjusted,” and the highest category by the 
phrase “Extremely maladjusted.” The in- 
termediate categories were left unanchored. 
These scales, both in length and in type of 
verbal anchoring, are identical with those used 
in previous research (3, 4). 

Subjects. The judges were 120 undergrad- 


uate students enrolled in three sections of a 
course in introductory psychology. The case 
history booklets and scales were randomly 
distributed to the Ss approximately midway 
in the semester with 20 Ss getting each of the 
combination of case history length and scale 
length. The Ss rated the cases outside of 
class and returned the booklets and scales at a 
subsequent class meeting. The usual request 
for noncollaboration among the Ss was made 
in distributing the booklets and in the face- 
sheet of rater instructions. 


Results 


The case histories by Ss matrix of ratings 
for each of the six subgroups (two case lengths 
by three scale lengths) were analyzed as a 
two-criterion analysis of variance, yielding 
mean squares for Cases (df = 9), Raters (df 
= 19), and Error (df = 171). The results of 
these six analyses can be found in Table 1.' 
The Cases mean squares were significant at 
the .01 level in each subgroup, indicating that 
the Ss significantly discriminated between the 
case histories in their judgments of adjust- 
ment level. The Raters mean square for the 
subgroup using the 7-category scale to rate 
the long case histories was significant at the 
.O1 level, while the similar mean square for 
the group rating the short cases was signifi- 
cant only at the .05 level. Neither of the 
Raters mean squares for the subgroups using 
9-category rating scales were significant at 
the .05 level. Differences exist among raters 
in the average rating they assign all ten cases, 
but these rater differences, which have been 
previously termed “rater bias” (1, 2, 3, 4), 
were reduced with the longer scales. 

The pairs of rater subgroups using similar 
length scales, but differing in the case length 
rated, were combined and three new analyses 
of variance, similar to the model described by 
Edwards (6, pp. 284-296), were computed. 
To test the feasibility of combining the pairs 
of subgroups, the ratios of the Error mean 
squares from the two groups were obtained 
and found not to be significant at the .10 
level. The results of these analyses can be 
found in Table 2* with the tests of the 


1 Tables 1, 2, 3, 5, and 7 have been deposited with 
the American Documentation Institute. Order Docu- 
ment No. 4674 from the ADI Auxiliary Publication 
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homogeneity of the error variances given at 
the bottom of the table. The Cases mean 
squares were significant at the .01 level in all 
three analyses. The Between Groups mean 
square was significant at the .05 level for the 
subgroups using the 5-category scale with the 
subgroup rating the short case histories as 
showing more maladjusted behavior. How- 
ever, the Groups mean square for the 7- and 
9-category scales did not approach a signifi- 
cant level of confidence. The Cases by Groups 
interaction mean square was significant at the 
.0O1 level for the subgroups using the 7-cate- 
gory scale, but again the similar interaction 
terms for the subgroups using 5- and 9-cate- 
gory scales were not significant at acceptable 
levels. 

It has been previously noted (3) that Cum- 
mings (5) had six psychiatrists rank these 
ten case histories for adjustment level and 
derived scale values for the cases from the 
pooled ranks. The ratings of each of the 120 
Ss in the present study were correlated with 
these scale values to give a measure of the 
“psychiatric reliability” of each S. The 20 
product-moment correlations within each sub- 
group were transformed into normally dis- 
tributed variate by the usual r-to-z technique 
and tested for intragroup heterogeneity (6, p. 
135). None of the resulting six chi-square 
values was significant at the .10 level. The 
subgroup variances were then tested for inter- 
group heterogeneity by Bartlett’s test (6, pp. 
195-197) and the chi-square value (9.11) was 
not significant at the .10 level (df= 5S). 
Finally, the z values were subjected to an 
analysis of variance and the results are re- 
ported in Table 3.1 The variables of case 
length (long vs. short) and scale length (5-, 
7-, or 9-categories) did not significantly af- 
fect the measures of psychiatric reliability and 
the interaction of these two variables was 
similarly insignificant. 

From the data used for Tables 1, 2, and 
3, average measures of psychiatric reliability 
(average correlation with psychiatric judg- 
ments), rater reliability (average correlation 


Project, Photoduplication Service, Library of Con- 
gress, Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to Chief, Photoduplication Service, 
Library of Congress. 


Table 4 


Summary of Average Individual Rater Reliability and 
Bias Coefficients for Each Group 


Case Length Psychi- 
history of rating atric Rater Rater 
length scale reliability reliability bias 
5 .80** .64"* 
Short 7 .86°* A4 
9 85** 34 
5 .83** .62** 
Long 7 83** .76** 13” 
9 84** .28 
5 .82** .68** 
Both 7 .63** 
9 .85** .29 


* Significant at the .05 level. 
** Significant at the .01 level. 


among the raters within a subgroup), and 
rater bias (variability among Ss in the aver- 
age rating each S assigns all ten cases) were 
computed for each group. These measures are 
described in more detail elsewhere (3, 4). The 
summary values are given in Table 4. Both 
psychiatric reliability and rater reliability in- 
crease slightly (but nonsignificantly) from 
the 5- to 7-category scales, but no difference 
in these two measures is apparent between 
the 7- and 9-category scales. Rater bias de- 
creased with longer scales. There appears to 
be no consistent difference between the long 
and short case histories on any of these three 
measures. 

Mean ratings of the single cases were com- 
puted for each of the six subgroups and the 
case means were correlated with the psy- 
chiatric scale values referred to above. The 
intercorrelations of these mean ratings be- 
tween the rater subgroups were also computed 
and these product-moment correlations are 
given above the principal diagonal in Table 
5.1 This matrix of intercorrelations between 
the rater and psychiatric subgroups was sub- 
jected to a centroid factor analysis. The analy- 
sis was replicated twice to stabilize the com- 
munality estimates and two factors were ex- 
tracted. The residual correlations between 
groups can be found below the principal di- 
agonal in Table 5 and the unrotated factor 
loadings and communalities for the seven 
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groups are given in Table 6. The first general 
factor accounts for the great majority of 
covariation among the subgroups and appears 
to be a general “agreement” factor present 
in the mean case ratings of the rater sub- 
groups. Factor II is bipolar with the psy- 
chiatrists and student raters using the 7-cate- 
gory scale with the short case histories on the 
negative end of the factor continuum and the 
three rater groups rating the long cases at the 
positive end. The groups using the 5- and 9- 
category scales to rate the short cases fall 
midway between these extremes. Apparently 
the Ss rating the short cases were slightly 
more similar to the psychiatrists in their judg- 
ments than were the Ss rating the long case 
histories. 

In Table 7* are given the mean ratings 
given each of the ten cases by the six sub- 
groups and ¢ tests of the significance of the 
differences between mean ratings given the 
long and short cases by subgroups using the 
same length of scale. Standard errors for the 
t tests were derived from the error mean 
squares in Table 2. The probabilities from 
the three ¢ tests for each case were combined 
by the chi-square technique of Fisher (7, 8). 
Three of the combined chi squares were sig- 
nificant at the .01 level (cases 7, 8, and 10) 
while two were significant at the .05 level 
(cases 3 and 5). In one case (case 10) raters 
receiving short case histories rated the case as 
less well adjusted than those rating long 
cases, while the other four cases were rated 


Table 6 


Factor Loadings (Decimal Points Omitted) for Rater 
Groups Derived from Intercorrelations of Mean 
Case History Ratings of Groups 


Unrotated 
factors 
Rater Case Length of 
group length scale I II i? 
1 Short 5-Categories 988 015 976 
2 Short 7-Categories 971 —128 959 
3 Short 9-Categories 988 013 976 
4 Long 5-Categories 982 103 975 
5 Long 7-Categories 991 111 994 
6 Long 9-Categories 991 087 990 
7 Psychiatrists 948 —138 918 


A. W. Bendig 


as being more well adjusted by the short case 
history subgroups. 


Discussion 


The question of whether the elimination 
from case histories of material previously 
judged to be unimportant will adversely af- 
fect the reliability of judgments of adjust- 
ment derived from the case histories appears 
to be answered negatively. No significant dif- 
ferences were evident between the Ss rating 
the “long” and “short” cases either in aver- 
age intercorrelation among the Ss (rater re- 
liability) or their correlation with psychiatric 
judgments (psychiatric reliability). In fact, 
the factor analysis of the subgroup intercor- 
relations of mean case ratings suggested that 
the Ss rating the short cases were somewhat 
more similar in their judgments to the psy- 
chiatrists than were the Ss rating the long 
case histories. The elimination of the irrele- 
vant material from the cases may have elimi- 
nated distracting information that the psy- 
chiatrists consciously excluded from consid- 
eration in forming their judgments, but that 
the more naive Ss, reading the long cases, 
allowed to bias their ratings. 

The difference in rater reliability found in 
one study that used short case histories (4) 
and a second study in which longer cases 
were used (3) apparently cannot be explained 
on the basis of case length differences. The 
length of the cases in the present research is 
comparable to the first study (4), but the 
magnitude of reliability is comparable only 
to those reported in the second study (3). 
Apparently the lower reliabilities reported by 
Bendig and Sprague (4) were due to the 
selection of the cases from psychological text- 
books. These cases may have been much 
more homogeneous in adjustment level than 
are the present case histories and increased 
homogeneity of the rated stimuli has been 
shown to reduce rater reliability (1, 2). 

The previously reported positive relation- 
ship between rating scale length and rater 
reliability (4) was not strongly evident in 
the present results. A small, nonsignificant 
increase in reliability from the 5- to 7-cate- 
gory scale was noted, but no difference was 
found between the 7- and 9-category scales. 
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This discrepancy between the two studies 
may be attributed either to omnipresent 
chance, or to the differences in the absolute 
magnitudes of rater reliability found. The 
present Ss were about as reliable as judges can 
be expected to be, while the Ss in the previous 
research showed relatively low reliability. The 
most tenable hypothesis is that there is a 
positive relationship between scale length and 
rater reliability, but that the strength of this 
relation decreases as the absolute magnitude 
of rater reliability increases. This hypothesis 
would also explain why a strong relationship 
was found for introductory psychology Ss, 
while this relation was less aparent for more 
educationally advanced Ss (4), since the less 
naive raters also displayed greater rater re- 
liability. 

The decrease in rater bias with longer scales 
contrasts markedly with the opposite result 
noted by Bendig and Sprague (4). No ready 
explanation of this discrepancy, other than 
the obvious chance or case differences between 
the two researches, appears plausible. Fur- 
ther evidence is needed on this point. 


Summary 


Introductory psychology Ss (N = 120) 
rated for adjustment level long and shortened 
versions of ten case histories using scales with 
either 5, 7, or 9 categories. No differences 
were found between the long and short cases 
in measures of rater reliability, but a factor 
analysis of the intercorrelations of mean case 
ratings among the subgroups, and with the 
case ratings of psychiatrists, indicated that 


the ratings of the short cases were slightly 
more similar to the psychiatric judgments 
than were the judgments of the longer case 
histories. Rater reliability increased slightly, 
but nonsignificantly, between the 5- and 7- 
category scales, while no difference in re- 
liability between the 7- and 9-category scales 
was evident. Rater bias tended to decrease 


‘ with the longer scales. 


Received April 5, 1955. 
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Some Evidence on the Validity of the Sarason 
Test Anxiety Scale 


Barclay Martin and Bruce McGowan 


University of Wisconsin} 


Sarason (1) has developed a scale designed 
to measure how anxious a person feels in test- 
ing situations. This paper reports further evi- 
dence bearing on the validity of this test. 

Two extreme groups of 18 Ss each were 
selected from a class of 340 students, on the 
basis of a slightly modified Test Anxiety 
Scale, and then seen twice in individual ses- 
sions. Both sessions consisted of a 4-minute 
rest period followed by a 5-minute discussion 
of their feelings and attitudes about test-tak- 
ing. Readings of palmar skin conductance 
were taken at regular intervals. The first ses- 
sion occurred early in the semester when no 
course examinations were being given, and 
the second session occurred within the hour 
preceding a course examination. Three things 
stand out in the results: (a) with the excep- 
tion of the Session 1 rest period, the high- 
anxiety group had a significantly higher skin 
conductance for all periods than the low- 
anxiety group (p < .05); (6) it is also ap- 
parent that both groups tended to be rela- 
tively less apprehensive in the second session 
as indicated by decreases in skin conductance 
for both rest and discussion periods, although 


1An extended report of this study may be ob- 
tained without charge from Barclay Martin, Psy- 
chology Dept., Univer. of Wisconsin, 600 N. Park St., 
Madison 6, Wis., or for a fee from the American 
Documentation Institute. Order Document No. 4641 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 


these decreases tended to be of only border- 
line significance; (c) there were no signifi- 
cant differences in relative change between 
the high- and low-anxiety groups from Ses- 
sion 1 to Session 2, and also no significant 
differences in relative change from rest to 
discussion period in either session. 

These results suggest that Sarason’s scale 
reflects partly a general anxiety factor which 
accounts for the general differences in con- 
ductance level between the groups. It can fur- 
ther be concluded that apprehension about the 
experimental session itself overshadowed con- 
cern about the course examination for both 
groups, as shown by the decrease in conduct- 
ance from first to second session, presumably 
as a result of familiarity with the situation 
and knowing what was likely to happen in the 
session. These results raise the question of 
how specific to testing situations is the anx- 
iety that the Sarason Scale measures, since 
there was no differential tendency for high as 
compared with low test anxiety Ss to be more 
anxious right before a test than otherwise. It 
may be that most of the differences found in 
performance between high and low groups 
are due to a general anxiety factor rather 
than to specific test anxiety. 


Brief Report 
Received July 14, 1955. 
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Movement as a “Rhetorical Embellishment” 
of Human Percepts 


Leon H. Levy 
Indiana University 


Although the human movement response, 
M, has played a major role in Rorschach test 
interpretation—Beck describes it as opening 
“to investigation a sector of personality that, 
effective as it is in determining the individu- 
al’s course, has been equally elusive of efforts 
to study it objectively” (2, p. 22)—few at- 
tempts have been made to test Rorschach’s 
assumption regarding its nature. Thus most of 
the interpretive hypotheses based on amount 
of M in a Rorschach protocol are still pre- 
dicated upon the assumption that M repre- 
sents a projection by the subject of certain 
kinesthetic experiences into the blot (2, 3, 
8, 9). 

There is no dearth of speculation concern- 
ing how kinesthesis may be stimulated by the 
inkblot, or other static visual stimuli, for that 
matter (1, 7, 12). But as yet no empirical 
data confirm the occurrence of the conditions 
necessary for kinesthesia, viz., changes in the 
musculature of the individual. Indeed, as Rust 
(7) pointed out, Rorschach cautioned against 
scoring M in many instances even when the 
subject resorted to gestures to communicate 
his percept. It is therefore doubtful whether 
M and non-M responses will ever be distin- 
guished on other than a verbal basis. But 
since Rorschach specifically differentiated be- 
tween M responses resulting from “movement 
sensed in the figure” (6, p. 25) and non-M 
which are “only a rhetorical embellishment of 
the answer, a secondary association” (6, p. 
25), it is clear that the scoring criteria for M 
must include some nonverbal measure. Since 
nonverbal indices did not prove expedient, 
Rorschach and those who have followed him 
have reasoned that one could sense move- 
ment only if one perceived the movement as 


occurring in a human figure, so that only hu- 
man movement is scored M. The scoring is 
based on the concept of empathy, whereby it 
is assumed that a person cannot empathize 
with animals or inanimate objects, but only 
with other humans. Since the concept of 
empathy is itself not clearly defined, the prin- 
ciple does little to clarify theory or practice. 

To those who prefer concepts somewhat 
better defined than M is, there appear to be 
two alternatives. The one is to discard the 
term entirely, while the other is to redefine it 
in terms which are more verifiable. In many 
instances the former alternative is to be pre- 
ferred. In the case of M there is some evi- 
dence (4, 5, 10, 11), albeit not entirely un- 
equivocal, of meaningful relationship to other 
psychological phenomena, so that the more 
conservative approach would appear justified. 
M may be a valuable correlate of certain psy- 
chological phenomena regardless of whether 
or not Rorschach’s conception of its nature 
was correct or whether or not his interpreta- 
tion is testable. If some more adequate con- 
ceptualization of M is possible, its interpre- 
tive value may be further enhanced. 

The explanation offered in this paper is one 
with which Rorschach would undoubtedly 
disagree. It is proposed that M is best un- 
derstood as a “rhetorical embellishment” or 
“secondary association” to use Rorschach’s 
own terms; that it is an epiphenomenon of 
the tendency to perceive human figures in the 
stimulus material. 

This view of M is predicated on the hy- 
pothesis that when a person is asked to de- 
scribe a human figure or a stimulus taken as 
representing a human figure, there is a high 
probability that movement of one kind or an- 
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other will be among the attributes cited. De- 
scribing any object is accomplished by listing 
its attributes. Since one of the most fre- 
quently reported attributes of the human or- 
ganism is movement, the taking of some po- 
sition, engaging in some kind of activity, etc., 
it should be expected that these attributes will 
be among those cited by the individual in his 
report of his perception of any visual stimu- 
lus he takes to represent the human figure. 


Hypothesis 


The frequency of occurrence of verbal re- 
sponses classifiable as movement responses in 
subjects’ reports of their perception of cer- 
tain visual stimuli having no common cul- 
tural interpretation will be directly related to 
whether or not these stimuli are presented as 
representing human figures. 


Procedure 


The 112 subjects (Ss) used in this study 
consisted of all those members of four discus- 
sion sections of an introductory psychology 
course who were present on the day of the ex- 
periment. 

Ten stimulus figures were constructed by 
combining three lines: two one and one-half 
inch lines joined to form a 60-degree angle, 
and one two-inch line drawn so that one of 
its end points was the apex of the angle 
formed by the other two lines. The figures 
were varied by varying the angle of the longer 
line in relation to the vertical, and also by 
varying the size of the angle formed by the 
longer line and either of the two shorter lines. 
The figures were presented to Ss in booklet 
form with one stimulus to a page. 

The Ss were divided into two groups of 56 
each, each group consisting of two of the dis- 
cussion sections. Group H received booklets 
with the following instructions on the front 
page: 

This is a test to see how imaginative and original 
you can be. In the booklet before you, you will find 
a series of ten figures composed of three lines in 
various arrangements. Each of these figures is meant 
to represent a person. You are to use your imagina- 
tion and describe in one sentence under each figure 
what the figure might represent. Of course there are 
no right or wrong answers to this; the important 
thing is to give the most imaginative descriptions 
possible. 


Leon H. Levy 


The Ss in Group NH received identical in- 
structions except that the italicized sentence 
was deleted. In each case these instructions 
were also repeated orally by the experimenter. 
All responses were scored for M using Klop- 
fer’s criteria (3). 

Results 


The mean number of M responses for 
Group H was 8.62 as compared with 1.12 for 
Group NH. Because of the nonnormal na- 
ture of the distribution, chi square was used 
in testing the hypothesis. Chi square was cal- 
culated for a 2 X 2 table comparing the num- 
ber of cases in each group falling above and 
below the median of the total sample, 3.5. 
With Yates’s correction for continuity, chi 
square was 82.312, significant beyond the 
.001 level. 

Discussion 

The results seem to support the interpreta- 
tion that amount of M in a Rorschach test 
protocol is a function of the tendency of the 
individual to perceive human figures in am- 
biguous visual stimuli. In one sense, the find- 
ing comes as no surprise since M can only be 
scored for human percepts, but it causes one 
to question what more, if anything, can be 
said regarding the nature of M. 

What is to become oi the various inter- 
pretations which are dependent upon M? To 
the extent that interpretations are dependent 
only upon rational exposition and lack em- 
pirical justification, they may be discarded 
with impunity. To the extent that they have 
been found to be empirically valid, they pose 
new and interesting research problems center- 
ing around the question: What is the psycho- 
logical significance of a tendency to perceive 
human figures in ambiguous visual stimuli? 
Since it appears from the findings that M 
does not account for all of the variance of H, 
further research might be directed toward 
studying instances where an individual gives 
a non-M human response. 

The “scoring” of a Rorschach test protocol 
might best be construed as a problem in the 
analysis of verbal behavior. The problem 
faced in Rorschach test research is one of 
determining what the most useful units of 
analysis shall be and how they are related to 
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other psychological phenomena. The findings 
suggest that unless M can be shown to be 
consistently related to other psychological 
events in a way which is independent of 
other scores, particularly the content score H, 
it might best be discarded. The result should 
be a simpler classification system of the same 
analytic power. 


Summary 


An experiment was performed to test the 
hypothesis that the tendency to give verbal 
responses scoreable as human movement, M, 
by Rorschach test scoring standards, is di- 
rectly related to the extent to which the in- 
dividual takes the visual stimulus to represent 
a human figure. The justification for scoring 
both M and H in Rorschach test analysis 
was questioned in the light of findings con- 
firming the hypothesis. 


Received March 14, 1955. 
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The Measurement of Primary Mental Abilities 
by the Columbia Mental Maturity Scale’ 


Cyril R. Mill and Charles J. Turner 
Richmond Public Schools, Richmond, Virginia 


The Columbia Mental Maturity Scale 
(CMM), published in 1954, was designed to 
measure the mental ability of children in the 
MA range from 3 to 12 years. Since the only 
response required is to point, it was thought 
suitable for use with children afflicted with 
cerebral palsy or other handicaps involving 
motor or verbal functioning. Administration 
and scoring are simple and testing time runs 
about 15 minutes. It is a potentially useful 
test for a quick measure of intelligence in 
clinic and school use with normal children. 
This study investigated the validity of the 
CMM 1Q’s, factors of intelligence measured 
by the CMM, and age levels at which i 
functions best. 

Seventy-five normal (nonhandicapped) chil- 
dren, equally divided by sex, from ages 7-0 
to 11-11 years, were administered the CMM 
and the SRA Primary Mental Abilities tests. 
CMM IQ’s correlated .63 with PMA Total 
IQ’s and .62 with PMA Non-Reading IQ’s. 
It was therefore concluded that the CMM 
did not provide a validity coefficient in this 
instance sufficiently high to warrant its use 
as an individual measure of intelligence. 

Correlations between CMM and the five 


1An extended report of this study may be ob- 
tained without charge from Cyril R. Mill, 407 N. 
12th Street, Richmond, Va., or for a fee from the 
American Documentation Institute. To obtain it 
from the latter source, order Document No. 4679 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 


subtests of the PMA were as follows: with 
Reasoning, .61; with Verbal Meaning, .51; 
with Perception, .48; with Number, .41; with 
Space, .31. These r’s suggest that the CMM 
is most heavily weighted with a measure of 
reasoning and understanding of ideas. It con- 
tains a moderate measure of numerical ability 
and ability to perceive small details quickly 
and accurately, and there is a slight measure 
of space perception. 

The mean CMM IQ’s were higher than 
mean PMA Total I1Q’s at all age levels tested. 
The differences were not significant at the 
nine- and ten-year levels. The CMM IQ’s 
were significantly higher for the eight year 
olds (5% level) and for the 7 and 11 year 
olds (1% level). The clinical impression was 
that an excessive number of numerical anal- 
ogy problems was used, with the result that 
some children learned to solve the tasks on 
the analogy principle without regard to the 
content. 

The SD of the CMM IQ’s was twice as 
large as the SD of the PMA Total IQ’s. 
Scores for boys were significantly more vari- 
able than for girls, possibly due to differ- 
ences in social maturity of the sexes at these 
age levels. 

This study indicates that the CMM meas- 
ures reasoning and verbal meaning even 
though it is a nonverbal test. It is not suffi- 
ciently valid at present for clinical use, but 
with further refinement it could fill a real 
need for a quick IQ measure for both nor- 
mal and afflicted children. 


Brief Report. 
Received August 23, 1955. 
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Reactions to Rorschach Cards IV and VII as a 
Function of Parental Availability in Childhood 


Ralph Hirschstein and Albert I. Rabin 


Michigan State College 


Rorschach workers have for some time been 
aware of the symbolic value and interpreta- 
tional importance of individual inkblots. Par- 
ticular attention has been paid to Cards IV 
and VII as symbolizing the father and mother 
figures, respectively. Despite this widespread 
practice, there is little explicitly stated in the 
literature, in support of this symbolic inter- 
pretation. Bochner and Halpern (1), speak- 
ing of Card IV state that “The heavy male 
figure may suggest the father or authority. 
. . -’ In Card VII they see “. . . a feminine 
quality, frequently with maternal implica- 
tions.” More recently, some experimental evi- 
dence has been marshalled in support of the 
symbolic meaning of the aforementioned Ror- 
schach cards (2, 3). This evidence is based 
upon direct questioning of subjects regarding 
the father and mother figures which might be 
represented by the inkblots. The results are 
based entirely on conscious association and 
critical reflection of the individuals tested. 

The purpose of the present investigation 
was to make an indirect check on the sym- 
bolic value of Cards IV and VII of the Ror- 
schach series. The main hypothesis was that 
individuals who are adolescent delinquents 
and in whose early background there were no 
significant mother or father figures would re- 
act more readily and more easily to these 
cards than would a similar group of delin- 
quents who grew up in the standard family 
situation and who had the opportunity of ac- 
quiring conceptions of father and mother fig- 
ures, disturbing as they may be. 


Method and Subjects 


The subjects of the present study were 40 
male juvenile delinquents, divided into two 


groups of 20 each. The two groups were 
matched for age and intellectual level of 
functioning, the mean ages being 15.2 and 
15.3 years, respectively, and the mean IQ’s 
97.6 and 96, respectively. All subjects had 
been placed by the courts at the Boys’ Vo- 
cational School at Lansing, Michigan. None 
of the delinquents used in this investigation 
revealed any psychotic manifestations or or- 
ganic involvement upon psychiatric examina- 
tion. Group NF (Non-Family) consisted of 
twenty youngsters who had no opportunity 
of living with their natural parents and who 
had been moved around from one foster home 
to another since infancy. The F (Family) 
group consisted of twenty youngsters who had 
been living with their own families (parents) 
continuously until the time of their arrest. 
Causes precipitating arrest were similar for 
both groups. Most common reasons for arrest 
were in order of frequency: car theft, break- 
ing and entering, and school truancy. 

Rorschach records were obtained on all 
subjects under standard conditions of exami- 
nation. Analysis of the data concentrated on 
the differences between the two groups with 
respect to Cards IV and VII. In accordance 
with our hypothesis, the expectations were 
that the “family” group would show greater 
difficulty in responding to Cards IV and VII 
than the “non-family” group. The underlying 
assumption is that part of the delinquents’ 
difficulty lies in the intrafamily relationships 
which may be reflected in Cards IV and VII. 
This should not prevail for the “non-family” 
group who had no definite parental relation- 
ships. 

Results 


The two groups were compared with respect 
to form quality (F + %), productivity (R), 
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Table 1 


Significance of the Difference of Percentage of the Total 
Number of Responses to Cards IV and VII 


Family Non-family Level of 
group group t significance 
16% 18.5% 1.68 10% 


and “reaction time to first response” on Cards 
IV and VII. A comparison of the F + %, for 
Cards IV and VII, yielded no significant dif- 
ferences. When the mean productivity for 
’ Cards IV and VII, combined, was compared 
for the two groups, a slight trend favoring 
the “family” group was noted. (See Table 1.) 

A comparison of reaction times to the 
individual Cards IV and VII indicated differ- 
ences in the expected direction. The mean re- 
action times for both groups, for all ten Ror- 
schach cards, is given in Table 2. The only 
differences that were found to be statistically 


Table 2 


Small Sample Tests of Significance of Difference 
of Means of Reaction Times to All Ten 
Rorschach Cards for the Family 
and Non-Family Groups 


Non- 
Family family 
group group 
Mean Mean Level of 
Cards’ R.T. t significance 
1 20.55 15.20 56 insignificant 
2 17.20 13.50 59 insignificant 
3 13.10 6.80 95 insignificant 
4 31.85 15.90 2.01 2%-3% 
5 13.70 9.45 89 insignificant 
6 32.85 30.05 31 insignificant 
7 22.45 12.85 1.91 5% 
8 19.15 17.85 ll insignificant 
9 26.90 21.95 .07 insignificant 
10 16.80 13.55 95 insignificant 


significant are those obtained on Cards IV 
and VII. The “family” group reacted more 
slowly to these cards than did the “non- 
family” group. 

It appears that the youngsters with ordi- 
nary family background were “shocked” or 
blocked when responding to Cards IV and 
VII; the youngsters with no parental back- 
ground did not show this blocking. It would 
seem that one can differentiate youngsters 
lacking a stable parental figure from those 
possessing one, on the basis of reaction time 
to Cards IV and VII. 


Summary 


Two groups of male delinquent subjects, 
matched for age and intelligence, but differ- 
ing with respect to parental availability in 
early childhood, were compared on the basis 
of their reactions to Rorschach Cards IV and 
VII. The youngsters who grew up with their 
natural family reacted significantly more 
slowly to these cards than did the group who 
had no real mother or father figure to identify 
with. Furthermore, a slight trend in the di- 
rection of restricted productivity to these 
cards, was noted in youngsters who had grown 
up with their natural parents. These results 
are offered as further evidence in support of 
the hypothesis that Cards IV and VII sym- 
bolize parental figures. 


Received March 22, 1955. 
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Rorschach Content and Hostile Behavior’ 


Martin R. Gluck 


Mental Hygiene Consultation Service, Fort Lewis, Washington 


A question of major concern in the use of 
projective techniques, and one which has pro- 
voked arguments both pro and con, is that 
of the relationship between the content of 
such protocols and the testee’s overt be- 
havior. Piotrowski (8), Wittenborn (11), and 
Rosenzweig (9), for example, either state or 
imply that direct prediction from projective 
records to overt behavior is possible. Klopfer’s 
most recent Rorschach manual (7) discusses 
these problems of prediction and content 
analysis much more fully than did his older 
book (6), while Beck (1), too, in one of his 
recent reviews of the Rorschach technique, 
stresses the need for further research into the 
meaning and importance of content. Lastly 
the author’s work-a-day experience suggests 
that users of projective tests, both psycholo- 
gists and psychiatrists alike, frequently tend 
to make such predictions and to act in ac- 
cordance with them. 

Hostility, especially its direction and 
strength, is one of the major dynamic forces 
which the projective tester may be called 
upon to assess through the medium of the 
Rorschach technique. The present study is 
therefore focused on the question: Can hostile 
(aggressive) behavior be predicted from the 
amount of hostility (aggression) found in the 
content of Rorschach protocols? 


1 Based on a dissertation submitted to the Grad- 
uate School, University of Pittsburgh in partial ful- 
fillment of the requirements for the degree of Doctor 
of Philosophy. The author acknowledges the help 
and guidance of his dissertation committee, and 
especially thanks Doctors A. David Lazovik and A. 
W. Bendig. Thanks are also due to the members of 
the Psychology Section, Neuropsychiatric Service, Fitz- 
simons Army Hospital, and to Mrs. Elizabeth J. 
Scheide for rescoring the Rorschach records. The 
Statements and conclusions of this study are the 
author’s and do not necessarily reflect the opinion 
or policy of the Army Medical Service. 
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Prior research of this “predictive” kind has 
used criteria which have differed widely, 
not only from this paper but from one an- 
other. In an early study Young and Hig- 
genbotham (12) searched for agreement in 
major psychological areas between Rorschach 
protocols and a large mass of behavioral data 
(e.g., case histories, counselors’ records, etc.), 
concluding that, although single determinants 
and simple ratios were not very useful, the 
Rorschach does reflect “ . . . potential forces 
and tensions within the individual . . . the 
resolution of [which] appears in behavior rela- 
tive to specific environmental stimuli” (12, p. 
93). 

Elizur (3), with a scoring system for 
evaluating both anxiety and hostility as ex- 
pressed in Rorschach content, compared the 
amount of projected hostility with measures 
of other kinds of hostility based on a ques- 


_tionnaire, a self-rating scale, and the judg- 


ments of observers watching the subjects’ be- 
havior in an interview situation. He reported 
significant correlations between all of the 
variables used and, in addition, demonstrated 
that the use of the content scores could dis- 
tinguish between the records of normal sub- 
jects and neurotic ones. Following Elizur’s 
lead, Walker (10) constructed a more refined 
scale and reported significant correlations be- 
tween Rorschach content scores and thera- 
pists’ evaluations of hostility in their patients. 


Problem and Definitions 


Because of the divergence of the criteria 
used in the studies reviewed and also because 
of the differing kinds of scale systems used 
to evaluate Rorschach content, it seemed 
useful to carry out a further study in which 
the criterion situation (and the “predictor” 
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scale) were tailored to the specific problem 
being investigated. 

Therefore, the following three hypotheses 
were formulated. 

1. The more overt hostility a subject ex- 
presses in his projective records, the more he 
will react with hostile behavior when placed in 
a situation which is provocative of such be- 
havior. 

2. The more covert hostility a subject ex- 
presses in his projective records, the less he 
will react with hostile behavior when placed in 
a situation which is provocative of such be- 
havior. 

3. Subjects who have a large amount of 
hostility of both kinds in their records will 
react with more hostile behavior than subjects 
whose total amount of hostility (in their 
records) is relatively less. 

“Behavior” appears here in the sense of 
gross actions or activities which are readily 
observable by another individual. “Hostility” 
is also used in the broadest fashion; that is, 
it is meant to connote feelings and/or actions 
carrying the idea of such words as anger, 
hatred, resentment, irritation, and exaspera- 
tion. 


Method 


Subjects in this research were thirty pa- 
tients of the Neuropsychiatric Service, Fitz- 
simons Army Hospital, who were given a 
standard test battery as an integral part of 
their individual psychological evaluations; the 
Rorschach was administered to each subject 
individually and in each examiner’s usual 
manner. The examiners were all psychology 
interns in training at the hospital. The author 
administered half of the total number, while 
the remaining protocols were obtained by the 
other four interns. 

After the psychological testing was com- 
pleted, the subjects were put into a stress 
situation designed and administered in such 
a way as to be extremely provocative of 
hostile behavior. There were nine tasks in 
the stress situation, wherein the subject’s be- 
havior and reactions to each were rated at 
the conclusion of the task by each of three 
observers. The final criterion score (of hostile 
behavior) was a composite derived from the 
observers’ pooled judgments of each subject 


on three of the tasks. Statistical tests indi- 
cated that the observers’ judgments of hos- 
tility were reliable and discriminating.’ 


The Rorschach (Hostility) Content Rating 
Scales 


Although there was some recognition of the 
difference between openly expressed and in- 
directly or covertly implied hostility in both 
Elizur’s and Walker’s work, these writers 
did not directly attack the problem as such. 
Consequently, it was decided to attempt an 
appraisal of these seemingly (qualitatively) 
different expressions of hostility. Using ex- 
amples and definitions gleaned from the afore- 
mentioned two studies and from De Vos’s (2) 
paper, scales were constructed, for both Overt 
and Covert expressions, with values ranging 
from 0 to 2. Before the scales were useable, it 
was found necessary to redefine the points 
on the scales jointly with the other rater. 
Sample responses from the sources mentioned 
above were employed as examples for the 
various scale points. This joint defining of the 
scale points should be borne in mind when 
evaluating the reliability figures presented in 
Table 1. 

“Overt” responses are those in which hos- 
tile, aggressive feeling is plainly conveyed by 
the content, thereby evoking little or no inter- 
pretation by the clinician before he labels 
these responses as hostile ones. Such responses 
have content which carry a commonly ac- 
cepted (i.e., “everyday”) hostile or aggressive 
connotation. 

“Covert” responses are those in which the 
hostility is perceived by means of symbolic 
interpretation, and which have content that 
does not usually convey a commonly accepted 
connotation of hostility. Examples of each 
point on both of the scales used are presented 
below. 


For the Overt Scale responses such as “Bulls 
charging each other,” “Two fellows in a duel— 
really fighting—you can see the sabres,” and “Ani- 
mals shot” received a rating of 2 (Most Intense). 

A rating of 1 (Less Intensive) was given for such 
responses as “Spears used by Indians,” “A quarrel- 
some person,” “Pistol,” “Arrow,” “Gun,” and ‘Knife.” 

Similarly, a 2 rating on the Covert Scale was 
given for responses like “Drunk who fell on his 


2A somewhat more detailed discussion of the 
criterion measure has been previously reported (5). 
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Table 1 efficients between the Criterion Score and the 
Reliabilities of Rorschach Scales Overt, Covert, and Total scores being .07, 
(N = 30) 18, and .16. 
Contrary to the previous studies already 
Rating Overt Covert Total cited, none of the three hypotheses of this 
Independent Ratings 83 67 68 these data. 
Combined Ratings ‘91 ‘80 rr rst consideration which comes to mind in 


back,” “Birds dressed in men’s clothing,” and “Bat 
with a torn wing.” 

The Covert rating of 1 was applied to responses 
similar to “A volcano,” “Teeth,” and “Torn leaves.” 

The foregoing are not all of the examples used to 
define the scale points. In addition to further ex- 
amples, the scales in their final form included state- 
ments of definitions of each of the rating points. 


Each rater scored the full Rorschach pro- 
tocols, scoring each response on whichever 
scale was appropriate. By adding the scores 
given on each of the scales, the subjects re- 
ceived three content hostility scores: an Overt 
score, a Covert score and a Total Score 
(Overt plus Covert scores). The reliability 
coefficients presented in Table 1 were com- 
puted from these three sums. The final Ror- 
schach Hostility Scores were obtained by 
summing each of the three scores arrived at 
by each of the two raters, permitting the use 
of the Spearman-Brown formula (4, p. 390) 
and resulting in heightened reliabilities for the 
combined ratings, and then dividing the com- 
bined score for each scale by the number of 
responses in the record. In recognition of the 
fact that the significance of a response or 
series of responses is frequently judged in 
terms of the length of the record, it was felt 
that the use of such a ratio score was de- 
sirable. 

The correlation between the Overt and 
Covert Rorschach Hostility scores was .33, a 
value which approaches significance at the .05 
level. Apparently, the two scales used were 
not independently tapping qualitative dif- 
ferences. 


Results and Discussion 


The results of the comparison between 
Rorschach Hostility Scores and the Criterion 
Score (composite rating of behavioral expres- 
sions of hostility in the stress situation) show 
no significant correlations, the correlation co- 


comparing these findings with those of previ- 
ous investigators is the difference in the 
criteria measures used. Walker worked with 
material derived from therapy, while Elizur 
used other test data and judgments from a 
relatively benign situation. The question then 
arises which, if any, of the methods is most 
appropriate or useful from the point of view 
of their relations to the patients’ environ- 
mental situations. Walker’s approach has 
merit from the standpoint of a therapist ask- 
ing for help in planning the therapeutic situa- 
tion, but neither his nor Elizur’s criteria seem 
to bear directly on broader life situations, e.g., 
those incidents where the patient is faced by 
the threat of a strong frustration and/or an 
authority figure. 

It was felt that in the research reported 
here the presence of a hostile and hostility- 
provoking authority figure, such as was the 
administrator of the stress situation, might 
well make the situation similar to those in 
which patients frequently have difficulty ad- 
justing. However, this strong authoritarian 
examiner may well have provoked the anxiety 
necessary to inhibit the hostile action at the 
same time that he was provoking hostile 
feelings within the subjects. 

This leads directly to a second considera- 
tion, i.e., the overlap of the functioning of the 
Overt and Covert Scales. As was noted earlier, 
the Rorschach content scales used here are 
not nearly as independent as one might wish. 
Both the author and the second scorer were 
impressed with the possibility that the sample 
responses might reflect anxiety feelings in ad- 
dition to those of hostility in a kind of in- 
termingling that has been noted by De Vos 
(2) and Elizur (3). Such a mutual relation- 
ship would have much significance for the 
type of prediction attempted in the present 
study, for without considering the role of 
anxiety as an inhibiting (as well as provoking) 
agent it appears that adequate prediction will 
not occur. 
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Conclusions 


From the foregoing data it is concluded 
that simple assessment of the amount of hos- 
tility contained in the content of a Rorschach 
protocol does not provide an accurate index 
to the patient’s proclivity or ability to behave 
in a hostile manner in what seems to be a 
hostility-provoking situation. 

Other pertinent factors related to the 
amount of anxiety present, effective controls 
available (e.g., Form Level, quality of M, 
FC to CF, etc.), and direction or target of 
hostility will have to be more precisely de- 
scribed before we will have accurate predic- 
tion of the type sought for here. In this re- 
spect the writer differs with Korner’s view 
(in Klopfer 7, pp. 490-491) that projective 
techniques are most valuably used in “.. . 
fantasy exploration . . .” and concomittantly 
that prediction of behavior is not necessarily 
desirable. Instead, the orientation expressed 
in Ainsworth’s introduction to the same chap- 
ter (7, Ch. 14) is more congenial to that of 
this paper inasmuch as it is felt there is use- 
fulness in research on the predictive abili- 
ties of interpretive hypotheses. Although the 
results presented here certainly evoke reser- 
vations about such an application of content 
rating scales, the technique used in establish- 
ing a criterion measure does seem to afford 
a workable approach to the problem of re- 
ceiving reliable judgments of “emotional be- 
havior.” 


Summary 


1. Hypotheses relating hostile Rorschach 
content to hostility expressed through behav- 
ior were stated. 

2. Subjects were given a Rorschach ex- 
amination and then put into a hostility-pro- 
voking stress situation, where the amount of 
hostility displayed behaviorally was rated by 
three judges. 

3. The content of the Rorschach records 
was rated on scales which were designed to 
assess amount and type of hostile content. 


4. Correlations between scores from the 
Rorschach content scales and the subjects’ 
behavior were found to be nonsignificant. 

5. Pertinent points were considered in re- 
lation io these findings and suggestions for 
the direction of future research offered. 

6. It was concluded that the hypotheses 
offered were not supported by the data re- 
sulting from the study. 


Received May 2, 1955. 
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A Statistical Comparison of the WISC and 
Wechsler-Bellevue, Form I’ 


John R. Price and Gareth D. Thorne * 


University of Denver 


Few studies making a comparison between 
the WISC and Wechsler-Bellevue Form I in 
an experimental situation have been reported. 
The need for such studies seems apparent 
insofar as both instruments are being used 
currently on the assumption that they are 
equivalent in the overlapping age range, 10 
to 15 years. In clinical situations the choice 
of which of these two tests to use with chil- 
dren in this age range has been left to the 
judgment of the E who has little recourse to 
comparative studies to aid in making this de- 
cision. Because of the widespread use of these 
two scales in clinics and school systems, it is 
felt that the assumption of equivalence war- 
rants further investigation. The hypothesis as 
set forth in this investigation is that the 
WISC and the WB Form I are not equivalent 
instruments with respect to the Verbal, Per- 
formance, and Full Scale scores within the 
aforementioned age range. 

Graham (2) dealt with the problem of 
equivalence insofar as his sample of unsuc- 
cessful readers had been evaluated in part 
through the use of these two tests. This in- 
vestigation, however, was primarily con- 
cerned with subtest analysis and did not sta- 
tistically treat the Scale scores. In a study 
using mentally retarded children, Vanderhost 
et al. (4) used a design similar to the one 
employed in the present investigation and 
made comparisons of the means of each Scale 
score with every other Scale score of the 


1 The writers wish to express their indebtedness to 
Drs. E. Ellis Graham, Joel E. Greene, and William 
L. Sawrey whose helpful advice and assistance made 
this study possible. 

2Mr. Thorne is at present Assistant Director of 
Training at the Caswell Training School, Kinston, 
North Carolina. 


tests used. In an investigation with normals, 
Delattre and Cole (1) were primarily inter- 
ested in various comparisons of subtest scores 
but reported the obtained Scale scores in 
their article. 


Method 


It was considered advantageous to employ 
an experimental design that would permit the 
use of one group of Ss for both tests. In or- 
der to prevent practice effect from either test 
influencing the results, the test to be ad- 
ministered first to each S was determined us- 
ing a table of random numbers with the re- 
striction that each test be administered first 
an equal number of times. 


The criterion for selecting Ss was solely chronologi- 
cal age. The lists of Ss were compiled in such a man- 
ner that at the anticipated time of testing each S 
would be at least 11 years, 44 months or 14 years, 
44 months of age respectively, and at most 11 years, 
74 months and 14 years, 74 months in each age 
group. The sample was selected from local school 
systems and consisted of 40 white, American chil- 
dren at each age level. Each group consisted of 20 
boys and 20 girls. These groups were divided so that 
both Es administered tests to 10 boys and 10 girls 
at each age level with each E administering both 
tests to each S in his group. Those Ss who had 
taken either the WB Form I or the WISC previ- 
ously were dropped from the group. 

Both tests were administered to the S during one 
session with a standard rest period of 15 minutes be- 
tween tests. In the actual testing all the subtests of 
both the WISC and WB were employed with the 
exception of the Mazes subtest on the WISC. Scor- 
ing of each test was done by the E who had ad- 
ministered the test. In the case of questionable re- 
sponses on such subtests as Comprehension, Simi- 
larities, etc., a consultation was held between Es in 
order to arrive at a score that was satisfactory to 
both Es in terms of the criteria as set forth in the 
manuals. 
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Table 1 


Means and Standard Deviations of 
Experimental Groups 


WISC WB 

Scale M SD M SD 
114 Years* 

Vs 102.98 13.36 94.90 14.25 

PS 103.63 15.11 108.50 14.74 

FS 103.65 14.25 102.15 14.87 
14} Years* 

VS 108.00 14.21 102.58 15.08 

PS 108.25 9.68 110.23 10.84 

FS 108.55 10.40 106.93 12.79 


*The actual mean ages for these groups were 11 years, 
64 months and 14 years, 7 months. 


Results 


At both the 114- and the 144-year age lev- 
els, higher mean IQ’s were obtained on the 
VS and FS of the WISC than on the WB, 
whereas the mean IQ of the PS of the WB 
was higher than that of the WISC at both of 
these age levels, as shown in Table 1. The 
higher means on the VS and FS of the WISC 
and higher mean PS of the WB parallel the 
findings of Graham (2) and Vanderhost et al. 
(4), but do not agree with Delattre and Cole 
(1). 

The initial comparisons involved testing for 
significance of differences between the experi- 
mental groups and the groups that were used 
by Wechsler to standardize both tests. Criti- 
cal ratios between means of the experimental 
and standardization groups and the F ratios 
between variances for these groups are given 
in Table 2. 

In order to test the hypothesis of lack of 
equivalence between two tests it is necessary 
to establish criteria that will define what is 
meant by equivalence. A technique used by 
Hollingsworth (3) for determining such cri- 
teria was employed. This method specifies 
two conditions that must be met in order to 
state that the two tests are equivalent. The 
first condition is that individuals should ob- 
tain essentially the same ranking on the VS, 
PS, and FS on one test as they obtained on 
the other allowing for chance error. 


In determining whether or not the first condition 
has been met by the data, it is necessary to agree 
upon some arbitrary value of r for a specific cri- 
terion. In arriving at this arbitrary value for r it 
seemed that a relationship between the standard 
error of measurement of the .JISC and the stand- 
ard error of estimating WISC scores from WB scores 
would prove to be the most satisfactory. Since a per- 
fect r between the two tests within each age group 
would be neither demanded nor obtainable, it was 
decided that “the desired r would be of such mag- 
nitude that the standard error of estimate would be 
not greater than 1.5 times the standard error of 
measurement” (3, p. 14) of the WISC. The follow- 
ing equation was set up to show this relationship: 


= 1.5 Swisc V1 — rwisc reliability, 


where rw.w is the desired r between the WISC and 
the WB. After solving this equation algebraically for 
rw.w the desired r was computed for all three scales 
at both age levels. To satisfy the first condition the 
desired r should lie between the obtained r and the 
upper 5% confidence limit for this obtained r. Table 
3 indicates the results obtained in meeting this first 
condition. 


The second condition necessary for equiva- 
lence is that each individual should obtain 
essentially the same IQ scores on one test as 
he obtained on the other, also allowing for 
chance error. In meeting this second condi- 
tion, a technique devised by Wilks (7) was 


Table 2 


Comparisons Between Experimental and 
Standardization Groups 


WISC WB 
Sig. of Sig. of Sig. of Sig. of 
diff. diff. diff. diff. 
between between between between 
Scale means variances means variances 
114 Years 
Vs 1.21 1.11 1.72 1.35 
PS 1.51 2.11** 2.54* 1.38 
FS 1.27 2.21* 44 1.12 
144 Years 
VS 3.28 1.34 88 1.13 
PS 4.97** 1.29 4.04** 1.70 
FS 3.98**  3.88** 261°. i351 


* Significant beyond the 5% level. 
** Significant beyond the 1% level. 


Note.—In the Experimental Groups, N = 40 for both tests 
at both age levels. In the WISC Standardization Groups, 
N = 200 at both age levels; in the WB Standardization Groups, 
N = 60 at the 11}-year age level, and N = 70 at the 14}-year 
level. In testing the significance of differences between vari- 
ances, df = N — 1. 
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Table 3 


Comparisons Between WISC and WB of the 
Experimental Groups 


Upper 
5% con- 
fidence 
limit 
Ob- for ob- De- 
tained tained sired 


PS 41 64 957 .970 
FS .78 88  93f 918 958 .958 


* Significant beyond the 5% level. 
** Significant beyond the 1% level. 
t Failed to meet the first condition. 


employed wherein two or more alternate 
forms of a test may be tested for equivalence. 
The initial procedure (Ly».-) involves testing 
the means, variances, and covariances simul- 
taneously for equivalence. Provided the null 
hypothesis is rejected at the desired level of 
confidence, the formulation is then frag- 
mented to determine whether the means, the 
variances, or the covariances are contributing 
to this lack of equality and if so to what de- 
gree. In the present investigation the test for 
variance-covariance (L,,) reduces to simply 
a test between variances since there are only 
two groups present. However, the correla- 
tional term in the covariance has been taken 
into account in the first condition above. 
Wilks has demonstrated that the test for 
means (L,,) is equivalent to an analysis of 
variance design with the tests (WISC and 
WB) as columns and the individuals as rows. 
Consequently, the over-all test for equiva- 
lence (Lec) is a powerful and appropriate 
test to be used in the initial phase of the in- 
vestigation. If there are no significant differ- 
ences using this initial test, it is highly un- 
likely that the follow-up tests (L,, and L,,) 
would be significant. The values for all three 
of these tests range from zero (highly sig- 
nificant) to one (no significance). Wilks’ 
tables for small samples (7, pp. 263-264) 


were used in determining the significance of 
the values given in Table 3. 

Instances where a lack of equivalence exists 
between scales of the WISC and WB in re- 
spect to the first and/or the second condition 
and where the hypothesis is supported are 
shown in Table 3 by the reference symbols 
+, **, and f. 


Discussion 


The values for the standardization groups 
used in the comparisons between experimen- 
tal and standardization groups were taken 
from the age groupings as given by Wechsler 
(5, 6) and were the closest obtainable to the 
ages of the experimental groups but did not 
coincide exactly with these ages. In the case 
of the WISC, the values were from two of 
the standardization subgroups whose mean 
ages were 10} and 134 years respectively (6, 
pp. 11-13). A difference of a year therefore 
existed between the mean ages of the stand- 
ardization groups and the mean ages of the 
experimental groups whose actual mean ages 
were 11 years, 64 months and 14 years, 7 
months. In the case of the WB, the values 
used were from subgroups whose mean ages 
were 11 and 14 years (5, pp. 122-123). The 
age difference is a possible but unlikely 
causal factor in reference to the differences 
obtained between experimental and standardi- 
zation groups (Table 2). In addition, Wech- 
sler included in the standardization popula- 
tion Ss from the lower end of the distribution 
of intelligence, whereas this was not done in 
the case of the experimental groups. Hence, 
one would expect to obtain significant dif- 
ferences in a comparison between the two 
groups. On the other hand, greater variance 
occurs about an equal number of times in 
both the experimental and standardization 
groups which is what would be desired in a 
comparison of this nature. 

Although this limited investigation has 
shown some definite results in terms of the 
specific ¢riteria set forth, it is felt that be- 
fore equivalence or lack of equivalence be- 
tween the WISC and WB Form I within this 
age range can be assumed, further study 
should be done with the subtests of these two 
tests. The present investigation omitted sta- 
tistical comparisons between subtests because 


% 

| 114 Years 
vs 85 92 .466** .998 .467** 
PS 79 88 87 .797* .798** 
FS 89 94 94 952 .998 .954 
1S os 509** 0064 so4** 
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of the lack of comparability of the WISC 
subtests with the WB subtests in any way 
other than with the use of a rank-order 
type of procedure. Raw scores are assigned 
weighted score values on the WB subtests 
prior to the application of the established 
age norms to the data. On the WISC, how- 
ever, raw subtest scores are assigned weighted 
values and age norms are taken into consid- 
eration simultaneously. Thus on the WB, a 
set of weighted subtest scores for any one in- 
dividual would be the same as the weighted 
scores for another individual of a different 
age who gave the same responses given by 
the first individual. The opposite holds true 
in the case of the WISC. Two individuals of 
different ages giving identical responses on 
all of the subtests would receive different 
weighted scores. Until a method is devised 
that permits statistical comparison of the 
WISC subtests with the WB subtests, it is 
felt that any conclusions drawn concerning 
equivalence of these two tests as a whole will 
be tenuous. 
Summary 


The hypothesis tested in this study stated 
that the WISC and WB Form I are not 
equivalent tests with respect to the VS, PS, 
and FS in the age range of 10 to 15 years. 
In testing this hypothesis, a sample of 40 
white, American Ss at the 114-year age level 
and 40 at the 14}-year age level was used. 
Three criteria for equivalence were specified 
in which (a) a desired r between the VS, PS, 
and FS of the two tests was established, and 
the differences between (5) means, and (c) 
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variances of the three scales were tested for 
significance. The hypothesis of lack of equiva- 
lence was supported to the degree the data 
failed to meet these three criteria. At the 
114-year age level, a lack of equivalence was 
established on the VS on criteria (a) and (bd), 
and on the PS on only criterion (4). No lack 
of equivalence was found on the FS. At the 
144-year age level, lack of equivalence was 
established on the VS on criterion (5) and 
on the PS and FS on criterion (a). 


Received March 24, 1955. 
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Diagnostic Accuracy or Diagnostic Stereotypy? 


C. H. Patterson 
Veterans Administration Regional Office, St. Paul, Minnesota 


In the April, 1954, issue of this Journal, 
there appeared an article by Albert Kostlan 
entitled “A method for the empirical study 
of psychodiagnosis” (9). This study con- 
cluded that diagnostic inferences which were 
“better than chance” were possible using two 
tests and a social case history, and also using 
certain “minimal data” consisting of age, 
marital status, occupation, education, source 
of referral, and the fact that the subject was 
a mental hygiene clinic patient (italics mine). 
The original report should be consulted for 
details of the study. 

While this study is ingenious in design, it 
would appear that there are two criticisms 
which affect the conclusions drawn. 

1. The measure of chance success was 
taken to be 50 per cent correct in designat- 
ing statements about a subject as true or 
false. Since significantly more than 50 per 
cent of two sets of statements were correctly 
labeled by the 20 psychologists participating 
in the experiment when only the “minimal 
data” were available, it is concluded that 
such data “allow inferences to be made which 
are more accurate than those expected on the 
basis of chance alone. Hence it follows that 
all the batteries allow inferences to be made 
more accurately than on the basis of chance 
alone, because the most errors were made un- 
der the condition of ‘minimal’ data” (9, pp. 
86-87). 

It is suggested here that the proportion cor- 
rectly labeled on the basis of the knowledge 
that the subject was a clinic patient, which 
knowledge was possessed by all judges in 
every condition, should be used as the chance 
expectation. It would appear that only if this 
knowledge were not available, and some of 
the subjects were not in fact patients, could 
50 per cent be taken as chance expectation. 
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Inferences made on the basis of tests and the 
social case history should then be compared 
with the 77 per cent and 68 per cent correctly 
labeled on the basis of the “minimal data” 
to determine whether the results were better 
than chance. These comparisons were actually 
made by Kostlan to evaluate the relative 
values of the tests and case history, but are 
affected by the criticism mentioned below. 

Kostlan states that “there was an expecta- 
tion that clinicians can make inferences which 
are more accurate than chance simply by 
knowing that a person is a psychiatric pa- 
tient” (9, p. 84). It is contended here that 
this is the chance expectation, rather than a 
pure 50 per cent. It is suggested that it is not 
an indication of diagnostic skill that psy- 
chologists, knowing that a subject is a clinic 
patient, can correctly label significantly more 
than 50 per cent of true-false statements 
taken from psychologists’ reports and thera- 
pists’ notes. Such statements tend to be gen- 
eral in nature and frequently psychological 
or psychiatric stereotypes or clinical clichés. 
The greater the number of such statements, 
which can be made about all clinic patients, 
the greater will be the percentage correctly 
labeled. The results would then indicate only 
that psychologists agree that psychiatric pa- 
tients, in general, can be described by certain 
statements or stereotypes. 

2. The conclusions that the use of tests 
and social case history data results in sig- 
nificantly better “diagnosis” than that ob- 
tained from use of only the “minimal data,” 
and (using the therapists’ statements) that 
using the MMPI and either the Rorschach or 
the Stein Sentence Completion, and the social 
case history, results in significantly better 
“diagnosis” than use of all three tests with- 
out the social case history data, are based 
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upon ¢ tests of the differences between these 
various conditions of judging. The F test for 
all conditions taken together was significant. 

The proper procedure for making individual 
comparisons following an analysis of variance 
has been a matter of concern to many statis- 
ticians for some time. Although Guilford (6), 
McNemar (11) and Lindquist (10) approve 
the use of the usual ¢ test for individual com- 
parisons following the finding of a significant 
F, other competent statisticians do not agree 
(1, 2, 3, 4, 5, 7, 13, 14, 18). It would seem 
that it should be clear that the ordinary ¢ test 
is not appropriate where the comparisons are 
not random, but selected, as in the case of 
comparing the largest with the smallest ob- 
tained means; or in the case where all possible 
comparisons are made, in which case a certain 
proportion of the results, determined by the 
significance level selected, will reach signifi- 
cance by chance. 

A number of solutions to this problem have 
been proposed since the first approximate, in- 
tuitive suggestion of Fisher in 1935 (5, p. 57). 
Fisher suggested that the number of possible 
comparisons be taken into consideration, so 
that the required probability for significance 
at the .05 level, for example, would not be 1 
in 20, but 1 in 20 [44(2—1)] where & is the 
number of variables being compared. This 
suggestion has been developed by several 
writers, and is mentioned in a few texts (1, p. 
239; 4, pp. 329-330; 7, p. 234), and was re- 
ferred to by the present writer in an earlier 
note (12). The application of this procedure 
in the study under discussion, for example, 
where 10 comparisons are involved, would re- 
quire that for a difference to be significant at 
the .05 level it would need to reach signifi- 
cance at the .005 level in the ordinary ¢ tables. 

Other solutions to this problem have been 
proposed recently. Tukey in 1949 (14) offered 
a solution which he later abandoned. This 
method has been used in several recent studies. 
Lindquist refers to it (10, p. 95, footnote), 
and Edwards in his recent text presents the 
method (4, pp. 330-335). It may therefore 
be expected that this now discarded method 
will persist in use for some time. In 1951 
Tukey introduced another method (15), and 
later suggested several others, for various 
types of conditions (16). 


C. H. Patterson 


Scheffé (13) has also considered this prob- 
lem and introduced a method which is ap- 
plicable where the interest is in all possible 
comparisons of pairs and combinations. He 
suggests Tukey’s later method for use where 
the interest is only in the k(A—1) individual 
comparisons. Scheffé’s method is given by 
Jones in his chapter in the 1955 Annual Re- 
view of Psychology (8). 

Most recently Duncan (3) has surveyed 
existing proposals, finding them inefficient, 
and has introduced his own solution. His 
method appears to avoid the overconservatism 
of most of the other solutions. It is an ex- 
tremely simple method to use. However, it 
has been developed only for the case of equal 
N’s in the subgroups, and needs to be ex- 
tended to the case of unequal N’s. 

This is not the place to evaluate the merits 
and appropriate conditions for the use of 
these various methods. The writer is presently 
preparing a paper on this problem. It is only 
desired to take this opportunity to call them 
to the attention of psychologists having oc- 
casion to analyze individual comparisons fol- 
lowing the F test. 

In the study under discussion it would 
therefore appear that adequate tests of the 
effects of the different conditions were not 
made, so that the significance of the differ- 
ences, and therefore the validity of the con- 
clusions, is not known. It is not the writer’s 
purpose to single out the author for particular 
censure, since the same procedure has been 
and is being widely used and since, at the 
time of the study, adequate techniques were 
not available, although Fisher’s proposal and 
Tukey’s 1949 method were. It is to be hoped 
that appropriate solutions will soon be avail- 
able for all circumstances where psychologists 
and statisticians have been plagued by the 
awareness of the inapplicability of the usual 
test. 


Received May 6, 1955. 
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A Reply to Patterson 


Albert Kostlan 
VA Hospital, Oakland, California 


C. H. Patterson (5) raises some issues, by 
implication, which are of importance to the 
research-disposed clinician. His first criticism 
is well taken. I differ, however, with the impli- 
cation of the title, that stereotypy connotes 
inaccuracy. What is the diagnostic process if 
it is not the ordering of the patient from a 
supraordinate stereotype (i.e., class of events) 
to more subordinate ones as increments of in- 
formation are accumulated and the most prob- 
able idiographic inferences emerge? Where, 
along this continuum, does stereotypy end? 
The present-day nosological categories, for all 
their disadvantages, are highly useful, em- 
pirically derived, stereotypes. Patterson sug- 
gests that diagnostic skill was not indicated by 
the condition of “minimal data.” The ques- 
tion remains open, but it is plausible that 
if minimal information would permit valid 
stereotypy, highly accurate inferences could 
be extrapolated from the stereotype. The 
problem has relevance to the current interest 
in statistical versus clinical prediction (2). 
Statistical prediction and its logical extension, 
“cookbook” diagnosis (3), is a special case 
of stereotypy. 

Patterson’s second criticism is the more 
serious. I cannot take issue with the sta- 
tistical arguments he presents, but I do feel 
that in a heuristic or scanning experiment 
(few researches in clinical psychology can be 
considered crucial), it is most reasonable to 
accept the authority of the more liberal statis- 
ticians. Of more relevance are the limitations 
of the experimental designs themselves, so 
that even “real” differences must be seen as 
presumptive real differences—to be studied 
further. What confronts us here is a situation 
similar to the old problem of choosing be- 
tween Type I or Type II errors (4, p. 69). In 
the case of formal studies of our own psycho- 
diagnostic processes, we do not have enough 
bathwater as it is, without running the risk 
of throwing away a baby—even occasionally. 


In this context, it may also be appropriate 
to differentiate between statistical and prag- 
matic significance. In a recent cross-validation 
study of several systems for detecting organic 
brain pathology by means of the Rorschach, 
it was found that agreement between the 
criterion and one of the systems on a four- 
fold table yielded a phi coefficient of .32 (1), 
which was found to be highly significantly dif- 
ferent from zero. However, statisticians will 
tell us that such a correlation indicates the 
Rorschach will do only about ten per cent 
better than tossing a coin. A clinician, on the 
other hand, is pleased to note that if he finds 
an “organic” record, he can be about 95 per 
cent certain that the patient has an organic 
lesion! Whither significance? In the case of 
the study Patterson discusses, the greatest 
difference that emerged (with a p of less than 
.01) was approximately six per cent fewer 
errors in favor of one battery. This is hardly 
of importance when the limited generaliza- 
bility of the results is considered. When spe- 
cific studies, notably in clinical psychology, 
are examined from these standpoints, the 
statistical objection that Patterson raises be- 
comes academic. 
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The Manifest Anxiety and ACE Scales and 
College Achievement’ 


Henry E. Klugh 


Alma College 
and A. W. Bendig 


University of Pittsburgh 


Although Matarazzo, et al. (2) reported a 
significant correlation of — .25 between MAS 
and ACE scores (N = 101), Schulz and Cal- 
vin (3) found an insignificant r of .02 (N 
= 98). As additional evidence, we adminis- 
tered the MAS and also Gough’s Hr scale 
(1) to all students enrolled in the introduc- 
tory psychology course at the University of 
Pittsburgh during the Fall semester of 1954— 
55. For 184 men and women Ss, data were 
available on four variables: the ACE, MAS, 
and Hr scales and the Ss’ Quality Point Av- 
erages (QPA). Raw scores on the three tests 
were converted to stanines on the basis of 
previous norm groups and Pearsonian corre- 
lations between the pairs of variables were 
computed. In addition, multiple correlations 
between the test variables and the QPA cri- 
terion were obtained. 

The intertest correlations were: ACE and 
MAS, — .11 (not significant); ACE and Hr, 
.29 (significant at .01 level); and Hr and 
MAS, — .29 (significant at the .01 level). 
The coefficients of each test with the QPA 
criterion were: ACE, .62; Hr, .32 (both sig- 
nificant at the .01 level); and MAS, .01. 
Comparison of the single order and multiple 
correlations indicated that (a) a combina- 


1An extended report of this study may be ob- 
tained without charge from A. W. Bendig, Dept. of 
Psychology, University of Pittsburgh, Pittsburgh 13, 
Pa., or for a fee from the American Documentation 
Institute. Order Document No. 4625 from ADI Aux- 
iliary Publication Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 
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tion of the ACE and Hr scales is a better 
predictor of QPA than the ACE alone (.05 
level), (6) adding the MAS to either the 
ACE or the Hr scales did not significantly in- 
crease the predictability of QPA, and (c) 
adding the MAS to the ACE-Hr combination 
did significantly (.05 level) increase the mul- 
tiple correlation with QPA. 

Our ACE-MAS correlation results fall mid- 
way between those previously reported (2, 
3). A significant negative correlation between 
the Hr and MAS scales has also been found 
in previous samples and casts doubt upon the 
hypothesis that the MAS correlates nega- 
tively with timed tests and zero with un- 
timed tests (2), since the Hr scale is un- 
timed. Apparently when the ACE and Hr 
tests are combined the addition of the MAS 
will increase the validity of the battery by 
acting as a weak suppressor variable for the 
QPA-irrelevarnt variance in both the ACE 
and Hr predictors. 


Brief Report 
Received June 23, 1955. 
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Editorial Note 


After this issue, the Journal of Consulting Psy- 
chology will publish no more reviews of books. The 
task of reviewing books in all fields of psychology 
will be assumed by a new journal, Contemporary 
Psychology, A Journal of Reviews. Sponsored by 
the American Psychological Association, the new 
periodical will make its first appearance in 1956, 
under the editorship of Edwin G. Boring of Har- 
vard University. 

The Journal of Consulting Psychology will con- 
tinue to review psychological tests, whether they are 
published as paper pamphlets, or as “tests in hard 
covers”—books which are manuals for specific tests. 
The plans of the journal contemplate two stages of 
test reviews. Brief notices, prepared mainly by the 
Editor, will appear promptly as in the past. Se- 
lected tests of above-average importance will later 
be the subjects of more extended and critical re- 
views, prepared by specialists on the invitation of 
the Editor. 

The reviews which appear in the present issue 
were written by the Advisory Editors, who may be 
identified by their initials. 


Books 


Baldwin, Alfred L. Behavior and development in 
childhood. New York: Dryden Press, 1955. Pp. 
xvii + 619. $6.25. 


In his Preface the author states that he attempts 
“to make explicit a theoretical framework which 
can be of help in predicting how children behave 
and how they develop.” The writer justifies this by 
drawing the reader’s attention to the fact that chil- 
dren are important, by noting that the facts of be- 
havior do not speak for themselves, and by indi- 
cating that such facts must be seen in a theoretical 
context before they can lead to valid predictions. 
Although the author offers his work as a theoretical 
text, it is certainly not a text of theory. As a mat- 
ter of fact, a theory of development of behavior re- 
ceives no particular emphasis until chapter 22 be- 
gins on page 537; analogies, schemata, and other 
conceptual devices are employed in most of the 
earlier chapters. The book is divided into two parts. 
Part One has to do with how the child behaves; 
Part Two is concerned with prediction of person- 
ality change. The presentation of the material is 
systematic, well-illustrated, and in the style of an 


elementary textbook. Nevertheless, because of the 
author’s rather comprehensive treatment of child 
behavior, the book is likely to interest the graduate 
student as well as the undergraduate student. Al- 
though there may be some divergence of opinion 
concerning the desirability of describing Baldwin’s 
work as a theoretical text, it is probable that almost 
all readers will regard it as a desirable addition to 
the textbooks currently available in the field of 
child development.—J. R. W. 


Balint, Alice. The early years of life: A psychoana- 
lytic study. New York: Basic Books, Inc., 1954. 
Pp. vii + 149. $3.00. 


Twenty years ago, the English psychoanalyst Alice 
Balint wrote a book (The Psychoanalysis of the 
Nursery) interpreting child behavior to parents and 
teachers. The present volume is a republication of 
this earlier work, designed to inform those who deal 
with children of some of the principles of child 
behavior and development. The book must have 
seemed much more novel two decades ago than it 
will today, for many of the principles presented as 
new then are well accepted now. Still, it is a 
straightforward, informative account of child devel- 
opment as the British psychoanalyst sees it. Chap- 
ter titles such as “The Oedipus Complex,” “The 
Castration Complex,” and “Identification” serve to 
indicate the consistently psychoanalytic viewpoint 
from which the book is written. Many sophisticated 
parents and many professional workers with chil- 
dren will find this volume a welcome review of psy- 
choanalytic fundamentals ——A. M. G. 


Burt, Sir Cyril. The subnormal mind. (3rd Ed.) 
London and New York: Oxford Univer. Press, 
1955. Pp. xx + 391. $5.00. 

The third edition of the now almost classic British 
volume by Burt has been broadened to include con- 
temporary findings from the fields of clinical and 
child psychology. These welcome additions render 
the original title somewhat misleading, for the book 
covers normality, delinquency, and neurosis as well 
as mental deficiency and dullness. The mentalistic 
frame of reference within which these topics are 
treated makes this volume quite different from the 
usual, and particularly attractive to readers who 
have tired of a strictly behavioral approach, or who 
are concerned with the implications of the concept 
of mind for problems of educational guidance. De- 
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spite a good bit of forensic material which is unique 
to the English educational system, the book has 
much to offer to all those interested in problems of 
childhood.—A. M. G. 


Bush, Robert R., & Mosteller, Frederick. Stochastic 
models for learning. New York: Wiley, 1955. Pp. 
364. $9.00. 

The authors consider learning to be any system- 
atic process of behavior change which eventually 
reaches a condition of relative stability. This very 
general, descriptive concept of learning is expressed 
in terms of a statistical model which describes but 
does not seek to explain the process of behavior 
change. The model in effect describes a behavioral 
sequence by expressing the probability of a future 
response in a sequence as a function of the fre- 
quency of certain prior responses. The conceptual 
basis for this general model appears to be simple 
enough, and the authors’ presentation is not dis- 
couragingly complicated. This discussion is particu- 
larly admirable because it provides descriptions 
which involve very few parameters. Since the au- 
thors go directly to their task and do not linger in 
obeisance before the sacred cows of learning theory, 
their approach is, at the very least, refreshing and 
may stimulate investigators to use new designs in 
procedure and analysis which could result in early 
increments to our understanding of the process of 
behavior change—J. R. W. 


Eissler, Ruth, et al. (Eds.). The psychoanalytic 
study of the child. Vol. IX. New York: Interna- 
tional Universities Press, 1954. Pp. 369. $7.50. 
This volume begins with two contributions from 

the New York Psychoanalytic Society’s 1954 meet- 

ing to consider problems of infantile neurosis. An 
impressive group of child analysts, headed by Anna 

Freud, discusses this complex topic, and their re- 

corded comments are probably the outstanding fea- 

ture of this book. Of the remaining papers, mention 
should be made of Stevenson’s study of children’s 

treasured possessions, Sperling’s presentation of a 

child’s imaginary companion, and the Planks’ paper 

on autobiographical accounts of emotional aspects 
of learning arithmetic. Both individual and group 
residential methods for treating disturbed children 
find space in the section on techniques. As con- 

trasted with earlier volumes of the series, the 1954 

book seems to contain more papers on rather spe- 

cialized topics, and fewer considerations of funda- 

mental theory of child development—A. M. G. 


Greenacre, Phyllis. Swift and Carroll: A psycho- 
analytic study of two lives. New York: Interna- 
tional Universities Press, 1955. Pp. 306. $5.00. 
Concerned with the lives and literary careers of 

the creators of Gulliver and Alice, this little book 

had its origins in the clinical impression of a prac- 
ticing analyst that fetishistic perversions are asso- 
ciated with distortions of the body image. The pres- 
entation of these sensaiions of change in bodily size 


or the size of various bodily parts in the books of 
two enigmatic artists presented a problem in both 
literary criticism and psychodynamics. In dealing 
with this challenge, Dr. Greenacre has many sug- 
gestive and stimulating things to say about the de- 
fensive functions of the self image, the psychological 
determinants of satire and nonsense as forms of 
humor, and the relationship between impulse-anxiety 
and literary creativity. Like most such work, Swift 
and Carroll provides new dimensions for the under- 
standing of books of lasting appeal and formulates 
new hypotheses for the comprehension of clinical 
complexities. Whether these hypotheses are sound or 
not is a question which awaits the work of investi- 
gators whose own ingenuity responds to Dr. Green- 
acres’s—E. J. S. 


Heiser, Karl F. Our backward children. New York: 
Norton, 1955. Pp. 240. $3.75. 


In a book planned to give parents and the inter- 
ested public an over-all view of the problems that 
must be faced in diagnosing, training, and caring for 
the backward child, Heiser has tried to clear some 
of the confusion that parents often obtain from their 
contacts with specialists. He is interested in reliev- 
ing parents’ feelings of tension and anxiety, and giv- 
ing them insight into the needs of the whole child. 
The book is especially recommended to those par- 
ents who are asking: “What about my slow child? 
What can be done for him?” There is an excellent 
index—B. M. L. 


Lorand, Sandor (Ed.). Yearbook of psychoanalysis. 
Vol. 10. New York: International Universities 
Press, 1955. Pp. 277. $7.50. 


This tenth volume continues the tradition of the 
Yearbook of Psychoanalysis in presenting a wide va- 
riety of contemporary psychoanalytic papers. There 
is something here to attract any reader, although it 
is only the rare specialist in psychoanalytic theory 
who will find the majority compelling. Bernfeld’s 
article on Freud’s study of cocaine is illuminating, 
especially considered in conjunction with Jones’s 
recent biographies of Freud. Notes on the Wolf Man 
and on Schreber add to the recorded information on 
these now classic cases. Of the usual series of papers 
on dreams, the one by Lewin seems most note- 
worthy for ingenuity of argument and precision of 
definition. Kant, Mark Twain, and W. S. Gilbert 
are the subjects of interesting psychoanalytic in- 
vestigation. Other papers on ego function, resistance, 
children’s early object-attachments, and various 
symptom manifestations complete the volume.— 
A. M.G. 


Michaels, Joseph J. Disorders of character. Spring- 
field, Ill.: Charles C Thomas, 1955. Pp. x + 148. 
The author “concludes” that there probably is a 

special kind of psychosomatic disposition that per- 

meates the delinquent individual and gives rise to 
specific individuation at the biological, psychologi- 
cal, and sociological levels. Moreover, he “concludes” 
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that the general pattern of lack of control which 
characterizes persistent enuresis permeates the whole 
psychopathic personality. “Concludes” is a strong 
verb to use in sentences that are so sweepingly gen- 
eral, particularly on the basis of the data presented 
in this book. The volume, with a bibliography of 
235 titles, makes reference not only to the studies 
on persistent delinquency but also to the pertinent 
theoretical literature on personality and character. 
The reported relationship between apprehended de- 
linquency and persistent enuresis is shown, and is 
related to the phenomena of psychopathy. The au- 
thor assumes a typological viewpoint, and argues 
that psychopathic personality and compulsive neuro- 
sis are psychobiological contrasts. The data might 
well lead one to hypothesize the existence of cer- 
tain constitutional factors (among others) in de- 
linquency and other complex behavior. The specific 
nature of these relationships has yet to be shown, 
since there are many variables that must be con- 
trolled. One important uncontrolled variable in most 
of the studies presented here is early interpersonal 
relations—a variable not emphasized by the au- 
thor —F. McK. 


Super, Donald E. Opportunities in psychology. New 
York: Vocational Guidance Manuals, 1955. Pp. 96. 
$1.15. 


This is one of a series of booklets designed for the 
intelligent choice and planning of a career. It might 
well serve as a model for occupational déscriptions 
of this kind. Super presents factual material which 
gives the student information of wide scope in a 
highly readable form. It should also be informative 
to those in the profession. He has presented the vo- 
cation in a manner which will probably satisfy, in 
the main, psychologists of widely different back- 
grounds and interests—a difficult undertaking. He 
has drawn upon most of the significant publications 
on the activities of the profession, its prospects, re- 
wards, fields, and organizations. The index and 
bibliography are brief and designed mainly for stu- 
dents. This volume serves a need of long duration 
for a booklet that may be handed to potential psy- 
chologists. In addition, it is a good contribution to 
the public relations of the profession—F. McK. 


Thorne, Frederick C. Principles of psychological ex- 
amining. Brandon, Vt.: Journal of Clinical Psy- 
chology, 1955. Pp. v + 494. $7.50. 

This textbook is designed as a complete guide to 
psychological diagnosis, based on the author’s eclectic 
approach, and essentially operational in nature. Psy- 


chodiagnostic testing is relegated to a relatively 
minor role; observational and interview techniques 
are preferred and examined in unusual and useful 
detail. Part One covers the theory and process of 
psychological examining; Part Two the study of fac- 
tors organizing personality integration. Each sub- 
topic discussed is exhaustively outlined, but a final 
synthesis is not achieved. The author’s insistence 
that mental health and good adjustment are more 
than just the absence of ill-health and defects, and 
require a positive integrative process, is refreshing 
and important.—A. R. 


Wallin, J. E. Wallace. Education of mentally handi- 
capped children. New York: Harper, 1955. Pp. 
xiii + 485. $4.50. 


Although there is no scarcity of publications on 
the mentally handicapped, this book is quite special. 
It is written by one who has been prominently as- 
sociated with the field of education of the mentally 
retarded from its very beginnings, more than fifty 
years ago. Tracing the origins of many current prac- 
tices, Wallin outlines briefly, but adequately, some 
of the major programs for the education of mentally 
handicapped children, paying particular attention to 
legislative provisions, administrative practices, teach- 
ing methods, and curricula. This volume has par- 
ticular interest for school psychologists, especially 
those of relatively recent vintage. Prior to the pro- 
fessionalization of school psychology, practitioners in 
“special education” frequently acquired psychological 
training, and many of them became the pioneers in 
school psychology. The modern school psychologist 
is well trained in clinical psychology, but often lacks 
close acquaintance with special education. Wallin’s 
book compensates for much of this lack. Extensive 
bibliographies add considerably to the usefulness of 
this volume.—M. K. 


Books Received 


De Kok, Winifred. You and your child. New York: 
Philosophical Library, 1955. Pp. vii + 147. $3.75. 


Farrell, B. A. (Ed.) Experimental psychology. New 
York: Philosophical Library, 1955. Pp. xi+ 66. 
$2.75. 

Sell, DeWitt E. (Ed.) Manual of applied correctional 
psychology. Columbus: Ohio Department of Men- 
tal Hygiene and Correction, no date (1955). Pp. v 
+ 68 (paper). 

Valentine, C. W. Parents and children. New York: 
Philosophical Library, 1955. Pp. xi + 212. $3.75. 
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